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RESTRICTED AMPLICON ANALYSIS 



Field of the Inventiftn 

The pxesrait invrntion generally provides a method which facilitates the 
detection of polymorphisms (or mutations). The method is directed to the analysis of 
so-called endonuclease site polymoiphisms (ESPs) that result in the gain or loss of a 
restriction endonuclease site. In essence, the ESP is probed with the restriction 
endonuclease reagent prior to amplification, whereby amplification is prevrated and 
consequently no signal is observed when cleavage takes place. Unambiguous allele 
calling is performed by comparing the signals obtained with and without cleavage with 
the restriction endonuclease reagent. The method is particularly useful for multiplex 
genotyping, involving the parallel analysis of large numbers of single nucleotide 
polymorphisms. Preferred methods for detecting the amplicons involve hybridization 
to an arrayed or otherwise identifiable set of cognate probe fragments or 
oligonucleotides. 

Background of the Inyention 

Molecular s^roaches for gmetic analyses trace the nucleotide sequmce 
variation that occurs naturally and randomly in the genomes of all living species. 
Knowledge of the DNA polymorphisms among individuals and between populations 
is important in understanding the complex links between genotypic and phenotypic 
variation. In the absrace of complete data about sequence variation, one relies on the 
ability to identify 'nearby' markers that allow to infer the location of certain relevant 
loci or causal sequ^ice variations. The informativaiess of the marker dq>ends on the 
magnitude of the linkage disequilibrium. Marters can be used in linkage studies to 
search for candidate genes and in association studies to identify the functional allelic 
variation on candidate genes that influence inter-individual variation. 

The vast majority of sequence variation consists of nucleotide 
substitutions, often referred to as single nucleotide polymorphism's (SNPs), resulting 
from mutations that have accumulated during evolution. Most of these nucleotide 
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changes aie genetically silent; i*e., they have no measurable biological effect, but 
provide an immfflse leseivoirof variation in DNA stnictuie. Most m^ods for genetic 
analysis used today rely on the detection of nucleotide sequence variation which can 
be measured by DNA fragment analysis using electrophoretic separation, in which 
5 DNA fragments are fractionated based on size or conformation. Occasionally the 
nucleotide sequence variation will affect either the presence of the DNA fragment or 
its mobility. In this way the primary nucleotide sequence variation will give rise to 
easily detectable DNA fragment polymoiphism. Since polymorphic DNA fragments 
are dmved fiom precise locations on the organism's gmme^ they can serve as reliable 

10 genetic markers, or landnuuks to identify and locate genes. 

A host of assays to detect DNA polymoiphisms, and SNPs in particular, 
have been developed. In some of these assays (e.g, , RFLP [Botstein, D. , White, R.L. , 
Skohiich, M., Davis, R.W., Am. J. Hum, Genet. 32:314-331 (1998)], CAPS 
[Konieczny, A. Ausubel, J.F., Plant J. 4:403-410 (1993)], dCAPS [Neff, M.M. Neff, 

15 LD., Chory, J., Pepper, A,E., The Plant Journal 14:387-392 (1998)], PIRA 
[Steinbom, R., MuUer, M., Brem, G., Biochim. Biophys. Acta 1397:295-304 (1998)]), 
restriction em^es are used to detect polymorphic nucleotide sequences that affect 
cleavage. The specificity of restriction enzymes is such that they exhibit a unique 
smsitivity to detect single nucleotide differences occurring in their recognition sites. 

20 Iheprinc^ial straigttis of restriction enzyme-based genetic analyses are the ease of use 
and the robustness of the assays. In the majority of the cases, the restriction site 
polymoiphism is used to detect known, previously identified SNPs and the assay 
consists of any electrophoretical fragment analysis. In one report, the allelic variation 
is detected in a solid-phase EUSA-type setting [Truett, G.E., Walker, J. A., Wilson, 

25 J.B., Redmann, S.M. Jr., TWley, R.T., Eckardt, G.R., Plastow, G., Manrni. Genome 
9:629-632 (1998)]. 

In WO 91/17269, Leraer et oL describe a different method for nuking 
a eukaryotic chromosome by restriction endonuclease msqjping of discrete DNA 
sequences which are complementary to a region of a eukaryotic chromosome. 

30 Vos et al , Nucl Acids Res. 23:4407-4414 (1995) and EP 0 534 858 

describe a technique for DNA fingeiprinting called AFLP which is based on the 
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selective polymerase chain zeaction based education of lestriction fragments of a 
digest of genomic DNA. The application reaction depends on the use of primers that 
extend into restriction fragments amplifying only those fragments in which prior 
extensions match the nucleotide sequence flanking the restriction sites. 

S Another mediod utilizmg DNA amplification steps is set out in Williams 

et al, NucL Acids Res. 18:6531*6535 (1990), who describe a DNA fingeiprintmg 
method termed random amplified polymorphic DNA. 

DNA amplification fmg^rinting was described by Caetano AnoUes in 
Bio/Technology 9:553-557 (1991). Still another fingerprinting technique called 

10 arbitnuily primed PGR was described in Welsh et al. , NucL Acids Res. 18:7213-7218 
(1990) and Welsh et al, NucL Acids Res. 19:861-866 (1991). 

In WO 94/11530, Cantor et cU. describe materials and methods for 
position and sequencing by hybridization. Cantor et aL also describe methods for 
creating assays of DNA probes useful in the practice of their method. 

15 The major shortcoming of the current methods of genetic analysis is the 

limited resolution of the DNA fragment analysis systems, namely the number of DNA 
fragmmts that can be sqxarated in a smgle assay. Cjenerally the fractionation resolution 
ranges from tens to a couple of hundred DNA fragments, at the most. Consequratly, 
current genetic analysis methods are limited to a few hundred to a thousand gm^c 

20 markers. While this resolution has been sufficient for analyzing ample genetic traits 
determined by single genes, the analysis of complex traits, which is now being 
undertaken and whidi involve gen^ or many differoit genes, will require the analysis 
of a much larger number of genetic markers. It is anticipated that such studies will 
require from a few thousand to possibly several hundred thousand genetic markers. 

25 Although this could conceivably be accomplished by performing many parallel assays, 
such scaling up wiU be cost- and labor prohibitive. 

A technology that has great potmtial and which is gmerating widespread 
interest in the so-called micro-anay technology (DNA chips). In general, these 
methods are based on measurement of the hybridization of DNA sequences in solution 

30 to probe sequences that are arrayed on a solid surEace. Whra assaying nucleotide 
polymorphisms, the detector relies on the small differences in hybridization efficiency 
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between two different DNA sequences. In one format, fluorescently labeled sample 
DNA is hybridized to dense arrays of probe nucleic acids, sequence-specific 
hybridization signal is detected by scanning confocal microscopy, and DNA variants 
scored as (predictable) differences in the hybridization pattern. The micro-arrays are 

5 fabricated either by in-situ light-directed oligonucleotide synthesis [Fodor, S.P. A. et 
al. Science 251: 767 (1991)] or by spotting DNA (off-chip synthesized 
oligonucleotides or PCR fragments) in an automated procedure. The technology has 
already been demonstrated in the scoring of mutations in mitochondrial DNA [Chee, 
M. et aL, Science 274: 610-614 (1996)], the HIV genome [Lq>shutz, RJ. ei aL, 

10 Biotechmgues 19: 442-447 (1995)], the CFTR cystic fibrosis gene [Cronin, M.T, et 
al. Human Mut.7: 244-255 (1996)], the BRCAl breast cancer gene (Hada, G.H. et 
aL, Nat. Genet. 14: 441-447 (1996)] as well as the entire yeast genome [Winzeler, 
E. A. et al. , Science 281: 1 194 (1998)] . In comparison with most other assays, micro- 
arrays provide a platform for high-throughput, massively parallel polymorphism 

15 detection. 

A major disadvantage with the use of microarrays relates to the 
complexity of the hybridization reaction. The detection relies on the very small 
difference in hybridization of DNA sequences differing by only one nucleotide. In. 
general, a set of 4 oligonucleotides, differing only in the identity of the central base, 

20 is synthesized for each position in the target sequrace that has to be interrogated. In 
pracdce, the number of oligonucleotides needed to corrsctly genotype one SNP is much 
larger, involving up to 56 difii^nt oligonucleotides spannmg the variable base [Wang 
etal.. Science 280: 1077-1082 (1998)]. The degree of redundancy is also dramatic if 
one wants to screen the target DNA for all possible mutations; the design then includes 

25 overlapping oligonucleotide-sets that are offset by one base (a process known as tiling). 
It should be noted that the detection of SNPs by hybridization to arrays depends on the 
use of short oligonucleotide probes. With longCT probes such as DNA fragments in the 
size range of 50 to 500 base pairs or larger, it is not possible to distinguish the SNP 
alleles. 
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Siimniarv of the Invention 
The present invention is directed to methods for genotyping 
polymorphisms that result in the gain or loss of an endonuclease cleavage site. Such 
S polymoiphisms are referred to hereinafter as endonuclease site polymorphisms (ESPs). 
Polymorphisms detectable according to the methods of the present invention include 
single nucleotide polymorphisms (SNPs). The methods of the present invmtion e}q)loit 
the high discriminatory power of lestricdon mzymes in a "Restricted Amplicon Assay" 
(RAA) which generally comprises the following stq>s (see Figure 1): 
10 (a) isolating sample DNA; 

(b) derivinglla set of targ^ DNA fragments, said set of target 
fragments comprising concomitantly amplifiable target DNA fragments from the 
sample DNA; 

(c) treating the target DNA fragments obtained in st^ (b) a probe 
1 5 restriction endonuclease reagent; 

(d) anq)lifying the amplifiable probe restriction endonuclea^ 
treated target DNA fragmrats of st^(c); and 

(e) analyzing the DNA of step (d) to determine which target 
firagmmts are ampMed and/or which target imgmaxts are not amplified; andwherdn 

20 anq>l]fied target fragments lack a recognidon site for the probe restriction radonuclease 
reagent and target fragmmts having a recognition site for a probe restriction 
endonuclease reagent are not amplified. 

In one aspect, the present invention is directed to RAA-methods, which 
comprise the preparation of concomitanfly amplifiable DNA segments by digestion of 

25 the starting DNA with one or more restriction endonucleases, collectively referred to 
h^:ein as sampling enzymes. This method is herein referred to as format-I RAA and 
is diagrammed in Figure 2. The digested starting DNA may be further modified at its 
termini by the addition of adqMers, which may serve to prime an amplification reactilon 
(seeFigure2). Once san^le DNA is obtained, it is treated with a differrat restriction 

30 enzyme, the probing enzyme also referred to as a probe restriction endormclease 
leagrat. A combination of probing and sampling enzymes are chosen such that a 
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substantial fraction of the sample fragments contain a single recognition site for the 
probe endonuclease neagmt. In general, probe razymes used with format-I RAA 
pnrferably have as a recognidon site a nucleotide sequence of less than six nucleotides. 

In another aspect, the present invention is directed to methods for 
5 format-n RAA for the detection of ESPs, as diagrammed in Figure 3 . Format-n RAA 
operates on the same principal as fonnat-I RAA except that the sample amplicons need 
not be DNA fiagments, but are rather defined regions of a genome amplifiable with 
specific primer pairs* The amplicons of the format-n RAA are identified on the basis 
of sequrace data; e.g. the sequmce of ESP-containing restriction fragments identified 

10 using format-I RAA method or otherwise known SNPs affecting endonuclease cleavage 
sites. In fbrmat-U RAA, the test DNA to be analyzed is treated with a probe restriction 
endonuclease reagent, followed by the concomitant amplification of regions of the 
treated DNA (amplicons) using predetermined primers using, for example, the 
polymerase chain reaction as described herein. The analysis of the amplification 

15 products then proceeds as described in the format-I RAA methods described herein. 
As with format-I RAA, an ESP is genotyped by the presrace or absence of a 
recognition site for the probe restriction endonuclease reagmt. 

In yet another aspect, ±e preset invention is directed to methods for 
fbmiat-m RAA. In essence, fonnat-m RAA consists of a comlnnation of the format-I 

20 and format-II ^>proaches. One of such combinations is diagrammed in Figure 4. Test 
DNA, digested or not with a probe endonuclease reagrat, is sampled with a pair of 
radonudease reagmts and the resulting f ragmrats are co- as desaibed in the fonnat-I 
assay amplified (this stap is referred to as the pre-amplification step). These pre- 
amplification mixtures are, in turn, used as templates for a format-n type of PCR 

25 reaction in which multiple ESP-containing regions are selectively co-amplified using 
specific primer sets. The analysis of the amplification products then proceeds as 
described before. The advantages of foimat-m RAA are that the stq)wise 
anq)lification facilitates the multiplex PCR of the ESP-q)edfic amplicons and lowers 
the amount of starting material required to interrogate all the ESPs. 

30 Arrays, or microarrays of probe DNA wherein the probe DNAs are 

useful in the detection of ESPs are also encompassed by the present invention. 
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Infomiative probe DNAs are prepared and identified as described in detail below and 
are then attadied to a substrate for use in tlie hybridization reactions with concomitantly 
ampUfiiable DNA after treatment with a probe restriction endonuclease reagent and 
subsequent amplification. 
S Since the method of the invention is based on the detection of a 

particular kind of DNA polymoiphism, which occurs in DNA of any organism, the 
invmtion will be universally s^licable. The methods of the present invention may be 
used to genotype ESPs in a wide variety of organisms from prokaryotic organisms, 
such as bacteria, through complex eukaryotic organisms, viruses, or any organism 

10 having a genome however simple or complex. The methods may also be used for the 
analysis of extrachromosomal DNA, the DNA found in certain cellular organelles, 
cDNA preparations, or DNA libraries, such as yeast artificial chromosome libraries 
and others. Furthermore, based on the large body of DNA sequence data at hand, it 
is predicted that the genomes of higher organisms carry several hundreds of thousands 

15 of such DNA polymorphism. Consequently, the new method is capable of diagnosing 
the inun^ise number of gpaetic markers that are needed to unravel complex traits. The 
method is of tremendous value for high throughput genetic analysis in the emerging 
field of pharmacogenomics. Similarly, the method has great potmtial in the field of 
animal and plant breeding, where high resolution genetic analysis will be needed to 

20 identify the genes involved in quantitative agronomic traits. 

Various aspects of the present invention are described in more detail 
hdow (see Detailed Descrq>tion of the Invention). Variations in each of these aspects 
will be readUy appreciated by one of ordinary skill in the art and one with the scope 
of the invention. 

25 

Brief Description of the Drflwings 
Figure 1 depicts the general concept of the Restricted Amplicon Assay. 
The vertical arrows indicate the positions of the ESPs. The open circles denote the 
probing aizyme sites that are preset, while die closed circles denote the mutated sites. 
30 The first step involves cleavage of the test DNA with the probing endonuclease. The 
second stq> involves PGR amplification of DNA segments comprising the ESPs. The 
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small horizontal arrows denote the PCR primers flanking the ESPs. When cleavage 
occurs the DNA is cut between the PCR primers, preventing the subsequent 
amplification of the DNA segment comprising those ESPs. Only those DNA segments 
that were not cleaved are amplified. The final step comprises assaying the amplicons. 

Figure 2: Diagrammed rq)resentation of foimat-I RAA. The vertical 
arrows indicate the positions of the ESPs» with the open and closed circles draoting the 
probing enzyme sites that are respectively present and absent. Stq> 1 represents the 
sampling enzyme cleavage step. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the adapter ligation step. The cypen 
lines represent the adapters ligated to the ends of the sampled restriction fragments. 
St^ 3 represents the probing enzyme cleavage step and the small horizontal arrows 
denote the PCR primers matching the adapter sequraces. Step 4 represents the PCR 
amplification step in which only the sample fragments that are not cleaved by the 
probing oizyme are amplified. The crossed circles represent the fragments that are not 
amplified. 

Figure 3: Diagrammed representation of format-n RAA. Hie vertical 
arrows indicate the positions of the ESPs , with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 rq>resrats the 
probing enzyme cleavage step. The dotted boxes denote the DNA sequences flanking 
the ESP sites. Step 2 rqiresents the PCR primer design. The small horizontal arrows 
denote the PCR primm flanking the ESPs Step 3 represents the PCR amplification step 
in which only the sample fragments that are not cleaved by the probing enzyme are 
amplified. The crossed circles represent the fragments that arc not amplified. 

Figure 4: Diagrammed representation of format-m RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed dreles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
sanipling mzyme cleavage stq>. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the pre-amplification step in which 
the sampled fragments are amplified. Step 3 rq)resents the probing enzyme cleavage 
stq). Stq> 4 rq)reseiits the PCR primer design. The snudl horizontal arrows denote the 
PCR primers flanking the ESPs. Step S rq)resfflts the PCR amplification step in which 
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only the sample fragments that are not cleaved by the probing enzyme are amplified. 
The crossed circles represent the fragments that are not amplified. 

Figure 5: Graphic representation of target fragments produced by 
cleavage with a hexacutter (full arrows) and a tetracutter (dotted arrows) restriction 
5 enzyme. Two types of fragments are produced: type I fragments (dotted lines) carrying 
two tetracutter ends and type II fragments (full lines) carrying one hexacutter end 
(represented by the arrowhead) and one tetracutter end. Upon PGR amplification only 
the type I fragments are amplified. 

Figure 6: EcoRI-Bfal fragments from ecotypes Columbia (C) and 

10 Landsberg (L) obtained after selective amplification using EcoRI and Bfal AFLP 
primers with respectively 2 and 3 selective nucleotides. The fragment patterns were 
obtained respectively without probing enzyme (no enzyme) and after digestion with the 
Msel probing enzyme. It is noted that most of the larger fragments do not survive after 
Msel digestion, while die majority of the smaller fragments survive the treatment. The 

15 differences between the ecotypes Columbia (C) and Landsberg (L) observed after Msel 
digestion, marked by the arrows represent ESP carrying fi^gments. The differences 
found without Msel digestion, marked by the stars represent typical AFLP 
polymorphisms. 

Figure 7: Hybridization patterns obtained on the Arabidopsis micro- 
20 arrays. The layout of the Arabidopsis micro-array is as follows: the left panel contains 
the ESP fragment probes derived from Columbia (upper half) and Landsberg (lower 
half), while the right panel contains the control monomorphic probes with respectively 
die negative control fragments (-control) always carrying a probing endonuclease site 
and the positive control fragments (+ control) carrying no probing endonuclease site. 
25 The upper part of the figure shows the hybridization patterns obtained with uncleaved 
sample DNA, while the lower part of the figure shows the hybridization patterns 
obtained with cleaved sample DNA. The dark-grey circles code is as follows: light- 
grey circles represent hybridization with the Cy3-labeled fragments, dark-grey circles 
represent hybridization with the Cy5-labeled fragments, black circles represents 
30 hybridization with both the Cy3-labeled and the CyS-labeled fragments, and open 
circles represent no hybridization. In this figure of a set of idealized results is 
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presented. The hybridization patterns with the uncleaved sample DNA shows that all 
probes detect sequences in both ecotypes. while the hybridization patterns with the 
cleaved sample DNA show that the ESP fragment probes detect only the sequences in 
the respective ecotypes from which the ESP fragments were isolated. In addition, 
ftagments carrying no site for the probing enzyme, detect sequences in both ecotypes, 
while fragments that always cany a site for the probing enzyme do not show a 
hybridization signal. 

Figure 8: Hybridization patterns obtained on the com micro-arrays. The 
layout of the com micro-array is as follows: the left panel of probes contains random 
fragments derived from B73, while the right panel contains Mol7-fragments. The 
figure shows four hybridization patterns obtained with respectively uncleaved sample 
DNA. Msel-cleaved, Tsp509I-cleaved and Alul-cleaved cleaved sample DNA. The 
uncleaved sample DNA hybridization pattern shows probes that hybridize only to B73 
(light-grey circles), respeaively Mol7 (dark-grey circles) fragments, which represent 
polymorphisms resulting from mutations in the sample enzyme recognition sites. The 
cross in the circle indicates that these probes are eliminated from the analysis. The 
cleaved sample DNA hybridization patterns show that the majority of the probes do not 
give a hybridization signal, indicating that their cognate fragments are cleaved by the 
probing enzyme. Most of the probes giving a signal hybridize to boUi sample DNAs. 
Those that hybridize to only one of die sample DNAs and that were eliminated 
represent fragments carrying ESPs. The arrows denote tiie probes that were retained 
for further analysis. 

Detailed Descrintion of the Invi^ntiftp 

The term "SNP" means Single Nucleotide Polymorphism, i.e. a 
polymorphism involving the mutation of a single base-pair. 

The term "ESP" means Endonuclease Site Polymorphism, i.e. a 
polymorphism involvmg two alleles one of which is cleaved by an endonuclease 
reagent while the otiier exhibits (at least partial) resistance to cleavage by tiie same 
endonuclease under the same conditions. 

The phrase "(restriction) endonuclease reagent" refers to a reagent that 
consists of one or more enzymes and that cleaves nucleic acids witii a certain 
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specificity, i.e. cleavage involves recognition of a particular sequence or set of 
sequences in the target DNA. Endonuclease reagents include but are not limited to the 
common type n restriction enzymes. 

The term "sampling endonuclease(s)" or ''sampling enzyme(s)" refers to 
S an endonuclease reagent used to derive sets of fragments from the sample DNA. 

The term "probing endonuclease(s)'' or "probing enzyme(s)" refers to an endonuclease 
reagent used to probe the allelic state at specific ESP-sites. 

The temi '^lymoiphism" refers to the existence of two or more alleles 
at significant frequencies (^1%) in the population; polymorphism at a single 
10 chromosomal location constitutes a genetic maiker. 

Hie term "micro-satelUte (DNA)" refers to a small array (often less than 
0.1 kb) of tandem rq)eats of a very simple sequrace, often 1 to 4 base-pair. Variability 
at such a locus is the basis of many genetic markers. 

Hie tenn "mutation" means a heritable alteration in the DNA sequence. 
15 The term "allele'* refers to one of several alternative sequence variants 

at a specific locus. 

The term "genotype" is commonly known to mean (i) the genetic 
constitution of an individual, or (ii) die types of allele found at a locus in an individual. 

The term "haplotype" refers to the geno^pe at a series of linked loci on 
20 a smgle chromosome. 

The term "sample DNA" or "sample fragments" refers to the set of 
fragments or amplicons derived from the starting DNA by the RAA method. 

The term "zygosity" refers to the homozygous or heterozygous state. 

The term "homozygosity/homozygous" refi^ to the presence of identical 
25 alleles at a locus. 

The term "heterozygosity/heterozygous" refers to the presmce of 
different alleles at a locus. 

Hie tenn "CpG" means a dinucleotide with a cytosine at the S'-side and 
a guanine at the 3*-side. CpG is relatively rare in mammalian DNA because of the 
30 tendency for the cytosine to be methylated and subsequratiy mutate to thymine by 
deamination. 
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The term "ecotype'' refers to a naturally occurring (plant) variety; race. 
The term 'l)i-allelic" refers to a polymorphic locus characterized by two 
different alleles. 

The terms "microarray" and ''(DNA-)chip" refer to a multitude of 
S spatially addressable nucleic acids that serve as probes. The microarray may be used 
in the form of a planar solid su|qK>rt, a bead, a sphere, or a polyhedron. Fabrication 
is done either by in situ combinatorial synthesis of oligonucleotides using 
photolithography, or by robotic spotting of off-chip prepared DNA onto a solid 
surface. 

10 The methods of the present invention differs conceptually from 

previously described restriction enzym&<lq)end0it assays {supra) that essentially detect 
a firagmCTt length polymorphism. With the present method, starting DNA is restricted 
prior to the amplification reaction and, rather than analyzing the obtained amplification 
product, the presence or absence of amplification is measured to determine the allelic 

1 5 state at an ESP site. The treated DNA is preferably amplified by using a polymerase 
chain reaction and is preferably analyzed by means of hybridization against arrays of 
probe DNAs. Widi the present method, a sample-amplicon, and consequently a 
hybridization signal, is either present or virtually absent. This feature represents a 
major advantage in that it results in a more accurate distinction between variable 

20 nucleotides than is possible by differential hybridization to allele-specific 
oligonucleoddes , and because it greatly facilitates the idmtification of a set of generally 
useful hybridization conditions. Also, the methods of the invention permit the use of 
both oligonucleotides as well as DNA fragments as probe DNAs. While hybridization 
to arrays allows the simultaneous analysis of a large number of ESPs, it should be clear 

25 that the amplification of sample DNA, treated with probe restriction endonuclease 
reag^, can be analyzed by any of a variety of methods well known in the art. In these 
mediods, an ESP is id^itified either by the presence of a recognition site for the probe 
restriction radonuclease reagmt (which will result in the failure of the sample DNA to 
amplify) or by the loss of a recognition site which wiU allow amplification of an 

30 otherwise unamplifiable sanq>le DNA. Ahemative methods include, but are not limited 
to, gel-electrophoretic analysis, and the TaqMan assay [Holland P. M. et a/., Proc, 
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Natl. Acad. Sci. 88: 7276-7280 (1991); with the latter assay detection is done during 

rather than after the amplification reaction]. 

One of the advantages of the method of the invention is the ability to 

calibrate the measured signal against that obtained in a control experiment where 

S digestion with the probe restricdon endonuclease reagmt is omitted. Comparison of the 

respective hybridization signals, following various corrections and normalization 

procedures, is essential for the genotyping of ESPs and the accurate determination of 

the zygosity. The cleaved and uncleaved material can, in principle, be hybridized 

sq)arately but a prefmed method consists of hybridizing a mixture of the differentially 

10 labeled samples to the same array. The preset invraition is ex^plified by several 

specific formats described below. 

fP FormaM RAA; Choice of sampling and probing restriction 
radonudease reagents. In one of its embodiments the present invention is directed 

to mediods for detecting ESPs in a "restricted amplicon assay" (RAA) which comprises 

15 preparing concomitantly amplifiable restriction fragments from the starting DNA 
(sample DNA). When generating discrete sets of DNA firagments from genomic DNA, 
the following paramet^ are important: the average fragment size and the total number 
of fragmmts. The optimal fragment size for use in the methods (and materials) of the 
present invention is a trade off; the fragments must be sufficiently small for 

20 amplification with roughly equal efficiency (in graeral <S00 base pairs) and large 
enough for having on average one cleavage site for the probing endonuclease reagent. 
In addition to average fragment size, the number of fragments determine the 
complexity of the sample DNA which is critical in view of die limitations of the 
detection sensitivity of micro-array hybridization. In general, the current state of the 

25 ait of microairay hybridization is such that the number of sample fragments should not 
exceed 100,000. All of the above-mentioned requisites can be met by the appropriate 
choice of sampling and probing enzymes. A preferred method of die present invention 
to prepare sample DNAs (amplicons) involves the use of two different sampling 
enzymes, a rare cutt^ mdonudease {e.g. , hexacutter) combined with a frequoat cutter 

30 endonuclease (e.g., tetracutter), as described in EP 0 534 858 Al which describes a 
method called AFLP and which is incoiporated herein by reference. As can be seen 
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from Figure S» the rare cutter enzyme produces large fragments that upon cleavage 
with the fitequent cutter enzyme are cut into a number of smaller firagmrats. This dual 
cleavage generates two types of fragments: the majority having both ends produced by 
the fi:equent cutter (type I) and a minority of fragments having a rare cutter end and a 

5 ftequent cutter end (type II). After ligating different adapters to each of the ends and 
using appropriate primers targeted to the ends of the fragments, only the type n 
fragments will be amplified efficiently (see Figure 5). The type I fragments amplify 
with greatly reduced efficiency presumably because the synthetic sequences at the two 
ends constitute an inverted repeat. In general the type n firagmrats will anq)]ify 

10 synchronously using a single PGR primer pair that attaches to the ends of the 
ftagmCTts. The size linut is typically around 500 base pairs, but can be increased by 
using a differrat DNA polymerase and other reaction conditions. Thus, as outlined 
above the number of amplifiable fragments will be determined primarily by the choice 
of the rare cutter restriction enzyme. By approximation, this number equals two times 

1 5 the number of cleavage sites for the rare cutter. In a preferred embodiment, restriction 
enzymes recognizing 6 nucleotides (h^cacutters) or more are used as rare cutters. The 
use of a frequent cutter recognizing 4 nucleotides (tetracutter) as second sampling 
enzyme results in the production of fragments in the optimal size range for co- 
amplification. As probe restriction endonuclease reagents, different tetracutter or 

20 praitacutteren^mescanbeused. Hie probe restriction endonuclease reagent and the 
fiequent cutter sampling enzyme should prderably be chosen such that the ratio of the 
cleavage frequencies of probing over sampling reagent is >0.S and <3. This will 
ensure that a substantial fraction of the taiget fragments are cleaved once by the 
probing enzyme. It is noted that ESPs cannot be genotyped when the fiagments are 

25 cleaved more than once by the probing enzyme. Also, it should be recognized that 
cleavage with the probe restriction endonuclease reagent results in a significant 
reduction (typically 2-4 fold) of the fragment complexity. 

Alternative schemes - different from the one described above - that meet 
the requisites of sample complexity, average fragment size, and occurrence fi:equency 

30 of the probe reag^ and that will perform equally well, will be readily SQ)parent to one 
of ordinary skill in the art. Alternative schraies may include the use of pairs of 
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fiequrat cutters, followed by selective amplification (described in EP 0 534 858 Al), 
or the use of type US restriction enzymes. Type US restriction enzymes are 
characterized by an asymmetric recognition sequence. Most of these enzymes cleave 
at a de^ed distance to one side of the recognition site and generate single stranded 
S overhangs that have differmt sequences. Ligation of adaptor sequmces that are 
complementary to only one type of overhang allows the amplification of specific 
subsets of fragments [Kikuya Kato, Nucleic Acids Res. 23: 3685-3690 (1995)]. With 
this strategy the set of fragments obtained with the sampling enzymes can be broken 
up in a defined number of coniplfflirataiy and roughly equally complex subsets. Thus, 
10 with these razymes it is possible to tune the complexity of the sample. The same 
strategy can be applied by making use of type n enzymes that have an interrupted 
palindromic recognition sequence. 

l^l^eof mutations drtected by fomiat-IRAA: In essence the method 

15 of the invCTtion aims to detect mutations affecting the recognition sequences of the site- 
specific probe endonuclease reagents. When the probe enzyme cleaves a sample 
fragment, it is prevented from being amplified and as a consequence the fragment will 
not give a hybridization signal with its cognate probe. Mutations affecting the 
recognition sequence of the probe enzyme will allow amplification of the sample 

20 ftagmmt and will restore the hybridization signal. It is recognized that mutations other 
than those affecting the probe enzyme recognition sites may affect the hybridization 
signals. In particular, mutations affecting the recognition sites of the sampling 
enzymes may also lead to a loss of hybridization signal. Consequently, the mere 
detection of a hybridization difference between two samples does not qualify the 

25 difference as being due to an ESP for the probing enzyme. For this one must also 
assay the two samples without probing OTzyme cleavage; only those differences that are 
correlated with the cleavage by the probing enzyme qualify as genuine ESPs as defined 
according to the present invention. Therefore, a preferred embodiment of the methods 
of the present invention comprise the comparison of the hybridization signals obtained 

30 with and without cleavage of the same starting material by the probe endonuclease 
reagent. Preferably, the digested and undigested sample DNAs are differentially 
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labeled such that equivalent amounts of the material can be nuxed and hybridized 
against ttie same anay of probes. It is noted that a fuither advantage of measuring the 
relative hybridization signals obtained with digested and undigested sample DNAs, is 
that the signal given by the undigested sample DNA serves as an internal control for 
S correcting variations in amplification and hybridization. 

Identification and design of informative probes to detect ESP- 
harboring fragments. In a preferred embodiment of the present invention sample 
DNAs (amplicons) are hybridized to micro-arrays comprising a set of probe DNAs 

1 0 wUdi are designed such that each probe will hybridize specifically to one sample DNA 
fragment. For each set of sample DNA fragments a specific set of probes are 
developed that will detect all the ESPs present in the set of sample DNAs. Since in 
most applications only a (minor) fraction of the sample DNAs will actually carry an 
ESP for a particular probing reagent, the set of probe DNAs will preferably consist of 

15 a subset of the sample DNA fragments that are informative in that they hybridize to 
ESP-harboring sample fragments. Preferably, the probes are highly specific for the 
ESP-carrying sample fragmrats, and do not cross-hybridize with a±er fragments in the 
sample. This feature is verified by testing the candidate probes in control hybridization 
assays. When developing or designing the probes care should be takra to avoid 

20 hybridizationof the labeled primer used to amplify the sample fragments. When the 
probes correspond to a subset of the sanq)le fragments, preferably an alternative set of 
adaptors should be used for their amplification. 

The sections below describe different approaches that may be used to 
assemble sets of unique probe DNAs for febricating the micro-arrays. Three 

25 alternative approaches are presented, and their choice is determined primarily by the 
degree of nucleotide sequence variation, and hence the ESP frequency, present in the 
species under study. 

(1) Direct screening. When the ESP frequency is high, such that 10% ormoreof the 
sample fitagmmts carry ESPs, a realistic approach for assembling ESP probes is 
30 to array mdividual sample fragmCTts and test which of them detect an ESP in the 

test material under study. The advantage of this approach is that the same set of 
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fragments can be tested witii different probe razymes. After the screening one 
will retain only those probes that yield a clear-cut difference in hybridization 
between the different test DNAs. This approach is illustrated in Example 2. 

(2) Gel-based screwing. With genomic DNA exhibiting intermediate ESP frequencies 
S (a few %), useful probes can be identified with a gel-based screening approach 

in which the ESPs are identified by comparing the patterns of sample fragments 
obtained fiom cleaved and uncleaved genomic DNA of various individuals. The 
polymoiphic fragmoits can then be isolated from the gel and cloned or amplified. 
In a second phase, these probe-fragments are verified in a micro-array 
10 hybridization assay. This approach is illustrated in Bxan^le 1. 

(3) Batch-wise hybridization selection method. Since both approaches described above 

are inefficient and labor intensive when the ESP frequency is low (< 1 %), it is 
advantageous to directly select or enrich ESP-carrying fragmrats. Such an 
approach is described in greater detail in Example 3. 
15 The methods of the invention can be used with any type of micro-anay: 

spotted ESP-carrying fragments, spotted oligonucleotides or oligonucleotides 
synthesized on solid supports using photolithography (Podor S. P. A. et al, Science 
251: (1991)]. OUgonucleotide probes can easily be designed based on the 

nucleotide sequences of the ESP-cairying fragments. Also, die methods of the 
20 invention are not limited to the use of planar arrays containing spatially addressable 
probes. A person of skill in the artwill recognize that the methods may alos employ 
a multitude of idmtifiable solid phase particles (e.g. beads, spheres, and polyhedron), 
each carrying a different probe. Examples of such use are described by Fulton, R. 
[U.S. Patent No. 5,736,330] and Mandecki, W. [ U.S. Patent No. 5,736,332]. 

25 

(TP Format-n RAA General nutlinft 

The *fonnat-I RAA' - as described above - can be conveited to a 
'format-n assay' when sufficient sequence information of ESP-containing sample 
fcagmrats becomes known. Fonnat-n RAAs can also be designed on the basis of the 
3 0 known sequences of graomic r^ons that harbor an ESP and that are available through 
publicly accessible databases. The approach involves the targeted sampling of starting 
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material and consists of the design of dedicated primer pairs that flank the ESP sites. 
Like in fonnat-I RAA, if the site is intact, the starting DNA will be cleaved and no 
PGR product will be generated. Only when the site is mutated will the amplicon be 
gmerated. In practice, multq)le ESP-containing genomic legkms are co-amplified aft^ 

5 cleavage with the probing restriction endonuclease reagent. The ultimate sample DNA 
used in the hybridization reaction is composed of several such multiplex PGR reactions 
pooled together. The feasibility of this approach is evidenced by the recent paper of 
Wang et al. , Science 280: 1077-1082 (1998), incorporated herein by reference. The 
m^hods for format-II RAA described here are identical to the ai^roach described by 

10 Wang et al.y in the way certain allelic regions are co-amplified, but fundamentally 
different in the way they are diagnosed* The present method takes advantage of tfie 
clear distincdon between having or not having an amplicon depending upon the allelic 
state of the endonuclease target site. The Wang et al. approach in contrast relies on die 
detection of a hybridization difference as a result of a single nucleotide variation in the 

1 S PGR product. This requires a much more elaborate and redundant hybridization assay. 

Similar to format-I RAA, a preferred method consists of comparing the 
hybridization signals obtained with and without cleavage with the probe restriction 
endonuclease reagrat. Preferably, the respective amplification reactions are 
differentially labeled such that the resulting amplicons can be mixed and hybridized 

20 against the same array of probes. 

Preferred methods of the format-n RAA are those wherein - of each 
PGR primer pair - that primer that remained unlabeled is used as hybridization probe 
for the corresponding amplicon. This ensures that the excess unincorporated labeled 
primer as well as the primer extrasion products obtained with this prim» cannot anneal 

25 to the arrayed probe. Also, the unlabeled PGR primer is complementary to the labeled 
strand of the amplicon. 

Furthermore, the format-n RAA method provides a means to monitor 
mutations in specific graes or loci in addition to scanning the entire graome. Indeed, 
sets of PGR primers that target ESPs in a specific gene or chromosome region can be 

30 assembled. 
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An RAA assay with positive det ection of both alleles: . It is 
recognized that the 'present/absent-score' of the RAA assay cannot (always) distinguish 
between different mutations that can affect cleavage by the probe restriction 
endonuclease reagent. In practice, an ESP should not be assayed when available 
S evidence indicates the existence of two or more such mutations at significant 
frequencies in the peculation. 

In a preferred embodiment the present invention is directed to the 
detection of SNPs that result in the simultaneous loss and gain of a restriction enzyme 
recognition site, Le. both alleles are associated with a different recognition site. Hgal 
10 (GACGC) and SfaNI (GATGC) are an example of such recq)rocal sites. Use of both 
probing endonuclease reagents in side-by-side e:q)eriments excludes alternative alleles 
and results in easy determination of the zygosity (refer to Example 4). 

Multi-allelic haplotyping; A single ESP rq)res^ a bi-allelic marker, 
IS which is less informative than a variable micro-satellite, which has multiple alleles. 
It is possible however to compensate for the lower information content by identifying 
several ESPs on a specific chromosomal region. Format*n RAA lends itself readily 
to such an sqjproach and involves the design of a primer pair that encompasses a region 
with a single site for the various selected probe endonuclease reagents. It should be 
20 recognized as one of the advantages of the preset method that multiple ESPs on a 
sample amplicon can be interrogated with a single probe. Furthermore, use of the 
probing enzymes, either separately or in various combinations, in parallel e:^riments 
allows the construction of the haplotypes for the ESPs under study. In general, the 
statistical associations between traits and specific chromosome regions may be more 
25 apparent when haplotypes rather than individual markers are used. 

OmFonnat-fflRAA: 

In a general sense, the format-m RAA represents a method of choice for 
very high-density SNP genotyping because it provides a means to overcome the 
30 intrinsic limitations of both the format-I RAA and the fonnat-II RAA. This is 
essentially achieved by performing a stepwise amplification involving a pre- 
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amplificadon of sanq)le fragments followed by amplification using mul^lexed specific 
primers. The principal advantage of the pie-amplification step is to reduce the 
complexity of the starting DNA, and thus to provide a more favorable starting point 
for perfonning multq>lex PCR reactions. It is noted that this improvement is graerally 
applicable to any multiplex PCR reaction, and is not limited to the methods of the 
present invention. Such an approach can also be used when for example SNPs are 
genotyped using the methods described by Wang et al. 

The princq)al limitation of the format-I RAA ties in the complexity of 
the sample DNA that is hybridized to the microanay. Because the second round of 
amplification in format-in yields only v^ small amplicons, which are all informative, 
ttore is no longer a limitation in number of sample fragments that are interrogated. In 
fact the entire genome may be sampled in a series of parallel pre-amplification 
reactions and the aniplicons gradated in the different multiplex PCR reaction can then 
be pooled together and hybridized to the microanay. 

Likewise, the fonnat-in RAA rq)resents preferred methods of format-n 
RAA, especially whm the ESPs under study are located on fragments generated by one 
s^ of sampling endonuclease reagents. Such stepwise amplification comprises the co- 
amplification of sample fragments with a single pair of primers, followed by the 
selective amplification of sets of specific ESP-containing regions (see Figure 5). The 
principal advantage of the format-m RAA over format-n RAA is that the initial 
amplification of the sampling fragments - represrating only a ficaction of the total 
genome - lowers the amount of starting material required to interrogate a very large 
numbers of ESPs. Also, tite iqiproach will facilitate the multq)lex amplification of the 
ESP-specific ampUcons and, consequentiy, yield a more robust assay. 

One preferred embodimrat of the format-m RAA is its use to genotype 
large numbers of ESPs identified through the use of the format-I RAA. Indeed, 
format-I RAA offers a r^id means to discover large numbers of ESPs in any biological 
species where no large body of sequence information is or will be available. Fonnat-I 
RAA enables one to discover many sets of ESPs for a number of different probing 
enzymes. Using the fonnat-I RAA, each set of ESPs must be assayed on a different 
microarray, because oth^wise signals for the same sample fiagment will overlq> with 
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one another, and thus preclude the proper ESP genotype to be detennined. Using the 
fonnat-m RAA, the ESPs identified with different probing enzymes are now assayed 
together on one smgle microarray, without overlap between the different ESPs. The 
reason is that the overlap in the format-I RAA is caused by the non-iiiformative sample 
5 fragments that are always co-amplified with the ESP fragments. These are eliminated 
from the mixture by the specific PCR amplification. This embodiment is illustrated in 
Examples 2 and 3. 

Another preferred embodiment of the format-m RAA is its use to 
g^iotype large numbers of SNPs identified in high-throughput sequencing of genomic 

10 DNA from different individuals from a given species. Given the generally recognized 
importance of SNPs for the development of high-resolution genotyping methods, 
sequenced SNPs can be e7q>ected to accumulate in large numbm in publicly available 
databases in die near future. In partioilar, in the field of human genetic analysis, SNPs 
will be discovered at a rapidly increasing rate through the massive genome sequencing 

15 programs now in progress. A similar evolution may be anticipated for many other 
species. Hence we decided to perform an in silica analysis of known human SNPs to 
further investigate the potential of the invention. More particularly we have analyzed 
the 3,358 SNP sequOTces present in the SNP database of the Whitdiead Institute [Wang 
etoL, Science 280: 1077-1082 (1998)]. We have determined how many of these SNPs 

20 represCTt an ESP for each of 34 known palindromic and non-palindromic tetra- and 
pmta-nucleotide restriction recognition sequences. When extrapolating this number to 
the total number of ESPs in the human genome - assuming a grand total of 3 million 
ESPs - it appears that the number of detectable ESPs per probing restriction enzyme 
is in the range of 25.000 to 150.000. A cumulative analysis reveals that 53% of the 

25 SNPs affert at least one of the 34 restriction sites; a total of 28% affect the recognition 
site for one of the available tetracutter enzymes. The principal conclusion from this 
analysis is that many of the considered enzymes - used as probing enzymes according 
to the methods of the present invention - will interrogate sufficient SNPs to be able to 
built a high-d»isity map of the human genome. It should also be noted that the use of 

30 multiple probing razymes is easily accommodated in the targeted assay because the 
sample has to be subdivided anyway over a number f parallel multiplex PCR 
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reactioDs. This embodiment is illustrated in Example 4. 

It is noted that the fonnat-in RAA may be performed according to 
different procedures. One such procedure is diagrammed in Figure 5, in which the 
test DNA is first sampled using a sampling endonuclease reagent, pre-amplified and 

S then treated with the probing endonuclease reagent. Variations on this procedure are 
readily recognized by those skilled in the ait and include for example, concomitant 
treatment of the test DNA with both the sampling and the probing endonuclease 
reagents and the preparation of sampled DNA fragments using arbitrary PGR priming 
methods [Williams et al.. Nucleic Adds Res. 18: 6S31-6S3S (1990)]. Note that in case 

10 the treatment with the probing endonuclease reagent is peifonned prior to the pre- 
amplification, the subsequent amplification can be performed with any pair of PGR 
primers directed against the BSP carrying fragments, and thus overcoming the 
limitation of using PGR primers flanking the ESPs. 



15 
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Table I. Analysis of 3,358 SNPs in the Whitehead SNP database. The table lists the 
number of SNPs that represent an ESP for various probing enzymes. The last colunm 
shows the estimated number of ESPs for each enzyme in the entire human genome 
(refer to text for details). 
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The following illustrative examples were chosen to represent the 
spectrum of genomic complexities and the spectrum of degrees of genetic variation 
which are suscq)tible to analysis using the methods of the present invention: 

Example 1 describes analysis of Aiabidopsis (low genomic complexity, 
low genetic variation). 

Example 2 describes gcmedc analysis of com (high genomic complexity, 
high genetic variation). 

Examples 3 and 4 describe genetic analysis in humans (high graomic 
complexity, low graetic variation). 

Numbers given in the examples, and that relate to the occurrence 
frequency of certain restriction sites as well as the average size of the gen^ated 
fragments are in part based on computer simulations using publicly available DNA 
sequences. 
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Example 1 
Gtnetic Analy>sis in Arabidopsiis 
In this example, a fragment analysis-based approach is used to geneiate 
a set of genomic fiagmrats canying BSPs between the Aiabidopsis ecotypes Landsbeig 
5 and Columbia, which are conunonly used for genetic studies in the model organism. 
Aiabidopsis is an example of a low complexity genome (size "120 Mb), and the two 
ecotypes ^diibit a moderate level of genetic variability. Previous studies have revealed 
that the average nucleotide sequence variation betwem the two ecotypes is in the order 
1 polymoiphism in ISO nucleotides. Consequentiy, the fraction of fragments expected 
10 to carry an ESP for tetranucleotide recognizing restriction enzymes is expected to be 
in the range of 2.5 % (1:40). With such a low frequency, it is helpful to use a selection 
procedure to isolate the rare fragments containing ESPs. 

In essence the procedure described in this example comprises the 
following steps: 

IS 4) Idratification of a set of about 2()6 genoniic fragments carrying 

Landsbeig/Columbia ESPs using a gd-etectrophoretic ^roach. 

5) Isolation and chamcterization of the ESP canying DNA 
fragments (ESP fragments). 

6) Generation of micro-arrays with the ESP fragments 
20 7) Confirmation of the ESPs by hybridization. 



Step I. Identification of ESP fragments. 
Sampling enzymes. In the present example EcoRI, a restriction enzyme 
lecogniziag 6 nucleotides (hexacutter), in combination with Bfal, a restriction enzyme 

2S recognizing 4 nucleotides (tetracutter), are chosen as sampling enzymes. From the 
random frequency of occurrence of 6 nucleotide sequences (every 4,000 bases), the 
number of sites for hexacutter restriction enzymes in this genome is predicted to be in 
the range of 30,(X)0. In addition to cleavage with a hexacutter, the genomic DNA is 
also cut with a tetracutter so as to generate PCR amplifiable fragmmts of an average 

30 size of a few hundred base pairs. Cleavage with the two enzymes gives rise to two 
types of fragmrats: a majority of fragments resulting from cleavage by the tetracutter 
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en^me alone and a smaller of fragments produced by the two enzymes (see Figure 
S). Since the majority of the hexacutter fragments will give rise to two fragments 
having a h^cacutter end and a tetracutter end {see Figure 5), this procedure will yield 
a mixture of about 60,000 fragments of this type. Upon amplification using the 
5 procedure described below only the fragments carrying a tetracutter end and a 
hexacutter end are amplified efRciently (Figure S). 

Probing enzymes. As probing enzymes many different tetracutter 
enzymes can be used. Ideally, the probing enzyme cleaves most of the sample 
fiagments once. Because plant DNA has a high AT content, the preferred tetracutters 

10 are those that have an AT bias in their recognition sequence. In general, the choice of 
an optimal tetracutter may be determined by particular features of the genome being 
analyzed {e.g. , AT and GC cont^). In the present example, Msel (recognition site = 
TTAA) was chosen. Tsp509I (recognition site = AATT) is an alternative. It is also 
conceivable to use mixtures of two or more tetracutter enzymes. The EcoRI-Bfal 

1 5 sample/taigrt fragments that are cleaved and not cleaved with the Msel probing enzyme 
are referred to as cleaved and uncleaved sample/target DNA, ieiq)ectively. 

Screening for ESP carrying fingmems. To (tetect ESP fiagments, subs^ 
of uncleaved and cleaved EcoKI-BM sample fragments from both ecotypes are 
amplified and the amplicons are compared following gel-electrpphoretic fractionation. 

20 Subsets of the EcoSI-Bfal sample fragments are selectively amplified as described 
[Vos, P. et al. Nucleic Adds Res. 23: 4407-4414 (1995); Zabeau, M. and Vos, P., 
Buropean Patmt Applicaticm BP 0S348S8 (1993) both of which are incorporated herein 
by refiaence]. Given the complexity of the sample ("'50,000 fragments), the selective 
amplifications are performed with EcoRI and Bfal primers having two and three 

25 selective nucleotides, respectively. This equals 1024 (16 x 64) different selective 
amplification reactions. 

The experimental procedure described by Vos P. et al. is followed 
except that the template fragments are incubated at 65^C during 10 minutes to heat- 
inactivate the T4 ligase razyme, and, when applicable, digested with the probing 

30 enzyme prior to amplification. The structures of the EcoRI and B£il adaptors are as 
follows {seey e.g., Vos, P. etal, supra]: 
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5'-CTCGTAGACTGCGTACC (SEQ ID NO: 1) 

CATCTGACGCATGGTTAA-5' (SEQ ID NO: 2) 

5 5 • -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

TACTCAGGACTCAT-5' (SEQ ID NO: 4) 

The EcoRI (radiolabeled by 5 '-phosphorylation) and Bfal primers, 
having two and three selective nucleotides, respectively, have the following sequences 
10 (where N represents A, C, G, or T): 

5' -GACTGCGTACCAATTCNN (SEQ ID NO: 5) 
5* -GATGAGTCCTGAGTAGNNN (SEQ ID NO: 6) 

15 

Using these reagents, most of the obtainable target firagmrats contain a 
cleavage site for die probing ozyme and, consequently, will not be amplified when the 
target DNA is cleaved. Most of the fragments that survive the treatment with the 
probing enzyme occur in both ecotypes, and thus cany no ESR Occasionally fragments 

20 are found that appear in both ecotypes when the taiget DNA is not digested and that 
are present in only one of the two ecotypes after digestion. These represent true ESPs 
for the probing enzyme. In addition, fragments will also be found that show typical 
AELP-polymoiphism between the two ecotypes [Vos, P. et al , Nucleic Acids Res. 23: 
4407-4414 (1995)]. Such polymorphisms are apparent in the fragment patterns 

2S obtainable with the undigested sample DNAs. A typical result is shown in Figure 6 in 
which the electrophoretic patterns are shown of selectively amplified EcoRI-Bfal 
fragments from the Ecotypes Columbia and Landsbeig obtained without and with 
digestion with the Msel probing emyme. 

Systematic comparison of the patterns of ecotypes Columbia and 

30 Landsbeig before and after digestion, allows the identification of EcoSI-Bfal sample 
amplicons that cany an ESP for the probing enzyme. Using Msel as probing enzyme, 
it is estimated that a total of ~200 polymorphic fragments which are present in only one 
of the ecotypes can be identified. 
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Step 2. Isolation and characteriziirinn of ESP fnpne^t^ 
Each of the ESP polymoiphic fragments is eluted from the gel-matrix, 
re-amplified and cloned into a suitable plasmid vector (e.g. TA cloning system; 
Invitrogen, Carlsbad, CA, U.S.A.). In each case, two clones are selected for sequence 
determination. Most duplicate clones will yield the same sequence. Duplicate clones 
that gave different sequences were not retained for further woric. Since the nucleotide 
sequence of over one third of the Arabidopsis genome is available in the public 
databases (e.g. , Grabank), the chiomosomal location of one third of the ESP fragmrats 
can be determined by matching the fragment sequences to the genomic sequCTce. 
Ruthennore since die gmomic sequrace is derived from ecotype Columbia, we expect 
a perfect match with the fragment sequences isolated from the same ecotype. The 
sequences of the fragments isolated from ecotype Landsberg will reveal single 
nucleotide differmces, amongst which the potential restriction site mutations, affecting 
the Msel recognition sites, should be apparent. 

In addition to the ESP polymorphic fragments, a number of non- 
polymoiphic control fragments are processed in the same way. Two types of such 
control monomoiphic fragments are isolated: fragmmts that do not carry a site for the 
probing enzyme and fragments that carry a site for the probing ^izyme in both 
ecotypes. These fragments will serve the puipose of verifying the hybridization on the 
micro-arrays. 

Step 3, Fabrication of ESP micm-airays 
Micro-arrays of amplified fragments. The insert DNAs from tiie 
sequence verified clones are amplified, e.g. with the use of non-selective EcoRI and 
Bfal primers. PCR products are verified by agarose gel electrophoresis and retained if 
a single product of the correct mobility was present. Following ethanol precipitation, 
the resuspended PCR products are arrayed at high density on standard glass slides (25 
X 76 mm) using eidier the Multigrid robotic spotter (GeneMachines^^, Genomic 
Instrumentation Services Inc., Menlo Paric, CA, U.S.A.) or tiie BioChq) Arrayer™ 
(Packard Instrument Company, Meriden, CT, U.S.A.). The DNAs are spotted in a 
logical ord^ with respect to the ecotype from which tiie fragmrats were isolated (upper 
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and lower panel) as shown in Figuie 7. In addition, a set of DNAs from monomoiphic 
control fragments was spotted next to the ESP fragment DNAs (right panel in Figure 
7). 

5 Micro-arrays of oligonucleotides. Based on the nucleotide sequences of 

the ESP fragments, oligonucleotides can be designed that can serve as hybridization 
probes to specifically detect each amplified sample fcagmmt. The oligonucleotide probe 
should preferably match with a sequence that is located to one side of the ESP, 
oppofflte the side where the sequrace targeted by the labeled primer is located. In this 
1 0 way the background is minimized because the linear amplification products generated 
by the labeled primer following digestion with the probing enzyme are not detected. 
The ESP fragment specific oligonucleotides are spotted in a nucro-anay format in 
exactiy the same way as the amplified ESP fiagmmts. 

IS Step 4. Micro-arrav-hased detection of R<SPr. 

Preparation of the sample DNAs. For each ecotype, sample DNA is 
psqxued in two different ways. Genomic DNA, digested with the sampling restriction 
enzymes EcoRI and Bfal, was amplified eitii^ as such or after cleavage with the 
prolnng mzyme Msel. The amplification reactions are performed with a fluorescentiy 

20 labeled EooRI primer and an unlabeled B£al primer, both without selective nucleotides. 
The EcoRI primer is labeled by incoiporation of Cy3(green)- and CyS(red)-amidites 
during primer synthesis (Amersham Pharmacia Biotech, Uppsala, Sweden). For both 
Columbia and Landsbeig, the cleaved sample was amplified with a Cy3-primer while 
the uncleaved fragments were amplified with a Cy5-labeled EcoRI primer. In addition, 

25 the Landsberg digested material was also amplified with a Cy5-labeled EcoRI PGR 
primer. Three different hybridization solutions are then prq>ared by mixing equal 
amounts (i.e. equal volumes) of the Cy3- and CyS-labeled amplification reactions: one 
from the Columbia cleaved and uncleaved samples, a second from the Landsbeig 
cleaved and undeaved samples , and a third by mixing the differentially labeled cleaved 

30 samples of both ecotypes* 

In case arrays of PCR products, rather than oligonucleotides, are used 
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as piobes (refer to step 3), the co-amplification of the EcoRI-Bfal sample fragments is 
piefeiably accomplished with a pair of adaptors that differs from those attached to the 
anayed probes. The alternative EcoRI and Bfal adaptors have the following structure: 

5 5 ' -GAGCATCTGACGCATCC (SEQ ID NO: 26) 

GTAGACTGCGTAGGTTAA- 5 • (SEQ ID NO: 27) 

5* -CT6CTACTCAGGACTG (SEQ ID NO: 13) 

ATGAGTCCTGA(aiT-5' (SEQ ID NO: 14) 
10 The cognate non-selective EcoRI and Bfal primers have the following 

sequences: 



5* -CTGACGCATCCAATTC (SEQ ID NO: 28) 
15 5' -CTACTCAG<3ACTGTAG (SEQ ID NO: 16) 

MicroHxrray hybridization. Each of tte hybridization solutions is allowed 
to hybridize to the arrayed probes using protocols well known in the art. The 
experimental conditions depend primarily on the nature of the probes, PCR-amplified 

20 fragments versus oligonucleotides. Both types of experiments are amply described in 
literature: Wodicka, h.etal., Nature Biotechnol 15: 1359-1367 (1997); Lockhart, D. 
J. etal., Namre Biotechnol 14: 1675-1680 (1996); DeRisi, J. L. etdl., Science 278: 
680-686 (1997); Shalon, D. etdl., Genome Res. 6: 639-645 (1996); KAu, G. ettd.. 
Genome Res. 6: 492-503 (1196); Chee, M. etoL, Science 274: 610-614 (1996); Wang 

25 D.G. et al. , Science 280: 1077-1082 (1998); Wmzder R A, ef oZ. , Science 281: 1 194- 
1197 (1998), all of which are incorporated herein by reference. 

A laser scanning system (ScanAnay 3000; General Scanning Inc., 
Watertown, MA, U.S.A.) is used to detect the two-color fluorescence hybridization 
signals from the micro-arrays at a resolution of 10 micron per pixel. A separate scan 

30 is carried out for each of the two fluorophores used. Scanning parameters and laser 
power settings are adjusted to normalize the signal in die two channels (channel-l/Cy3; 
channel-2/Cy5). Hie obtained digital images were analyzed using the ImaGene^ image 
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analysis software (BioDiscovery Inc., Los Angeles, CA, U.S.A.). The extracted 
quantitative data are transferred to a spreadsheet for further analysis. 

The present hybridization experiment is essentially set up as a 
confirmation of the gel-electrophoretic data (refer to step 1), and has, therefore, a 
S predictable outcome. In addition, a number of control probes are included on the 
biochip that detect monomorphic EcoSI-Bfal Arabidopsis fragments (i.e., fragments 
on which a site for the probing enzyme is either present or absent in both ecotypes). 
The results from these control probes allow correction for bacl^round and optical 
cross-talk between the two channels, as well as calibration of the red and green 

10 hybridization signals. It is anticipated that the vast majority of the processed data are 
unambiguous with respect to the allelic state of a sample fragment and in agt&Gmeaat 
with the gel-electrq>horetic analysis. Figure 7 shows a false-color rq)resentation of the 
idealized results of the present «periment using a fictitious array of probes. It cannot 
be excluded that certain hybridization results are not in agreement with the gel- 

15 electrophoretic assay and/or that certain probes do not allow unambiguous 
determination of the allelic state of the cognate sample fiagment. Such probes should 
be excluded firom the micn>>anays that are used to genotype experimental Arabidopsis 
samples, other than the Columbia and Landsberg controls used in the presrat 
illustrative example. 

20 In routine genotyping eqierimrats, either one of the hybridization 

schemes outlined above can be used. Determinationof the allelic state can be done by 
comparing the hybridization signals obtained with and without cleavage of the starting 
DNA with the probe reagent. Alternatively, allele-calling could be based on a 
comparison of the signals obtained with ±e test-sample and an appropriate control (e.g. 

25 Columbia or Landsberg DNA), bc^ cleaved with the probe endonuclease reagent. The 
samples that need to be compared can, in principle, be hybridized separately but a 
preferred mediod consists of hybridizing a mixture of differentially labeled samples to 
the same array. 



30 
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Example! 
Genetic Analysis in rnm 

In this example, the utility of the method of the invention for maricer 
S assisted selection applications in plant and animal breeding is illustrated. Com has been 
chos» because it is a typical iqnesentative of crop species having a complex genome. 
Tlie laige size of the genome (2,400 Mb), the frequent occurrence of rq)etitive DNA 
sequences and the high degree of genetic variation, all constitute technical challenges. 
In this example, an approach based on the generation of a set of genomic fragments 

10 carrying ESPs fiom two well-known inbred lines of com, B73 and Mol7 from which 
many of the com elite lines are derived is used. Another reason for choosing these 
lines is that a well-studied recombinant inbred population derived from these lines is 
available. This population can be used to map the set of ESPs. The genetic map of ESP 
markers will prove to be an effective tool for genetic selection in com breeding. It is 

15 evident, howev^, that a broader survey of the com gennplasm with a total of 10 to 20 
lines will give a large number of additional ESPs (possibly 2 or 3 times as many) and 
will eventually result in a higher-resolution genetic map. 

The ESP-harboring fragments could very well be identified by the gel- 
electrophoretic approach described for ArabidQpsis (Example 1). However, an 

20 alternative strategy may be used given that the com gennplasm, like many crop 
species, exhibits a high degree of genetic variation. Indeed, based on previous studies, 
the average nucleotide sequence variation in the com gennplasm is estimated to be in 
the order of 1 difference in 15 to 30 nucleotides. This corresponds to a frequency in 
ESPs in the recognition sites of tetracutter restriction enzymes of 1 in 4. At this 

25 frequency it becomes feasible to directiy examine arrays of random B73/Mol7- 
fragments for the presence of ESPs using the present RAA mediod without prior 
screening or selection. The strategy also lends itself readily to screening with several 
different probing enzymes. 

In the presmt example, two different approaches for assaying ESPs are 

30 used. The first method (fonnat-I RAA) is sinular to tiie one described in Example 1 , 
and detects ESPs in fragments sampled with a pair of restriction enzymes. In the 
second mettiod (fonnat-m RAA) individual ESPs are selectively amplified from the 
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sampled fragments with dedicated primer sets. The principal advantage of the latter 
approach is that ESPs detected with several different probing enzymes can be assayed 
simultaneously^ and Oat multq)lex amplification of ESP-specific PGR products is made 
considerably more robust. 
S In essence the procedure described in this example comprises the 

following steps: 

8. Identification of a set of candidate ESP fragments fram the 
inbred lines B73 and Mol7 

9. Development of a com ESP mic^anay 

10 10. Genetic nuqiping of a B73/Nfol7 recombinant inbred population 

and of segregating populations 

Step L Identification of candidate ES P fragments 
Cloning of a set of sample fragments. To clone a set of random 
1 5 fragments from the inbred lines B73 and Mol7, the enzyme combination PstI and Bfel 
is used. The hexanucleotide-recognizing enzyme Pstl was chosen because of the large 
size of the com gmome. It is estimated that this enzyme has around 30,000 sites in the 
com genome. The second tetracutter-enzyme^ B&I, is e^qpected to cleave in the 
majority of the cases on both sides of the Psa sites. The double digestion will therefore 
20 generate about 60,000 sample fragments with an average size of 400-500 base pairs. 

Following double digestion of the gmomic DNA, Psa- and Bfal- 
adaptors were ligated to the fragment ends and the material amplified with non- 
selective Pstl and BM primers. Hie stmctures of the Pstl- and Bfal-adaptors are based 
on those described by Vos P. et al, Nucleic Adds Res. 23: 4407-4414 (1995): 

25 

5 • -CTCGTAGACTGCGTACATGCA (SEQ ID NO: 7) 
3» -CATCTGACGCATGT (SEQ ID NO: 8) 

5' -GACGATGAGTCCTGAG {SEQ ID NO: 3) 
30 3' -TACTCAGGACTCAT (SEQ ID NO: 4) 



Hie corresponding Pstl and Bfal non-selective primm have the following sequences: 
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5'-GACTGCX3TACATGCAG (SEQ ID NO: 9) 
5' -6ATGAGTCCTGAGTAG (SEQ ID NO: 10) 

5 

The amplificadon step enriches the Pstl-Bfal fragments over the laige 
excess of Bfal-Bfal fragments. After amplification the fragments are fractionated on 
an agarose gel to eliminate the fragments smaller than 100 base pair, and cloned in an 
appropriate vector (e.g. TA cloning system; Invitrogen, Carlsbad, CA, U.S.A.). 

10 Preparation of spotted mcro-arrays wdth the cloned sample DNA 

fragments. The insert DNAs, from the two libraries of cloned Psa-Bfal sample 
fragments (obtained from the B73 and Mol7 inbred lines), are amplified from the 
clones using the non-selective PstI and Bfal primers. Following purification and 
concentration, the amplicons are arrayed as described in Example 1 . A total of 20,0(X) 

IS (i.e. 10,000 from each library) candidate probe DNAs are spotted. 

Micro-array hybridization and selection of candidate ESP-fragments. 
From genomic DNA of the inbred lines B73 and Nfo 17 four different sets of Pstl/B£al- 
digested amplified DNA are pieparBd. An aheniative pair of ad^ 
amplification primers are used for this: 

20 

5*-GAGaiTCTGACGCATGTTGCA (SEQ ID NO: 11) 
3' -GTAGACTGCGTACA (SEQ ID NO: 12) 

5'-CTGCTACTCAGGACTG (SEQ ID NO: 13) 
25 3 ' -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5' -CTGACGCATGTTGCAG (SEQ ID NO: 15) 

5» -CTACTCAGQACTGTAG (SEQ ID NO: 16) 

30 

Hie sample fragments are amplified dth^ as such or after digestion with 
one of three alternative probing enzymes, Msel, Tsp509I and Alul. As probing 
enzymes many different t^racutter or pentacutter enzymes can be used. Because plant 
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DNA has a high AT content, the preferred enzymes are those that have an AT bias in 
their recognition sequence. Alternatively, mixtures of two or more tetnicutter or 
pentacutter enzymes can be used. 

For each of the B73 samples, a Cy3(green)-labeled PstI primer is used, 
S whereas the Mol7-dmved fragmmts are amplified with a CyS(red)-labeled PstI primer 
(refer to Example 1). Different hybridization solutions are then prq)ared by mixing 
equal amounts of the uncleaved, Msd-cleaved, TspS09I-cleaved, and Ahil-^leaved 
samples of both inbred lines. Each of the 4 mixes is allowed to hybridize to the micro- 
anays. Analysis of the scanned images involved normalization using the multitude of 
10 probes on the arrays tiiat detect monomorphic fragments. Figure 8 shows a false-color 
represaitadon of the idealized results of the preset experiment using a fictitious array 
of probes. 

Analysis reveals that candidate ESP fragments are readily identified by 
scoring the probes that hybridize with only one of the two inbred line sample DNAs 

IS after cleavage with the probe enzyme (Figure 8). The quantitative analysis allows us 
the use of an unambiguous cut-off threshold of 10-fold difference in the normalized 
signal intensities for scoring ESPs. It should be pomted out that the assay identifies 
both bona fide ESPs and polymorphisms in the sampling aoizyme sites. Most of the 
latter polymoiphisms result in a marked hybridization diSecmce with the sample DNAs 

20 not cleaved with the probe enzyme (see Figure 8). Analysis of 180 probes reveals that 
roughly 6% of the sample fragments cany ESPs for Msel, Tsp509I, or Alul, in 
accordance with the expected ESP mutation frequency. The analysis of 20,000 cloned 
probe fragments is thus expected to yield a total of 1,200 fragments carrying ESPs for 
the three probe enzymes tested. By using additional tetracutter and pentacutter 

25 enzymes (see Table I), the fraction of ESP carrying fragments may be as high as 25%, 
amounting to 5,000 ESPs. 

Of all probes that exhibit a differential hybridization with the cleaved 
sample DNAs, only those in which the recognition site for the probing razyme is 
presmt were retained for development of a com micro-anay. Sequence determination 

30 of these probe-fragments reveals the position of the recognition site for tiie probe 
enzyme. Thus, we retained only those probes that £suled to give a signal with the 
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cleaved sample DNA from the same inbred line from which they were isolated. Such 
probes exhibit the hybridization pattern shown in the Table here below and are marked 
with an arrow in Figure 8. 

B73/Mol7 (Cy3/Cy5) normalized hybridization signal 
S Undigested MseI/TspS09I/AluI-digest6d 

B73-probes ''I < 0.1 

Mol7-probes "1 > 10 



10 Step 2. Development of a com ESP micro-array 

Sequencing of the candidate ESPs and design of marker specific primers. 
Clones corresponding to the probes that yield the desired hybridization pattern (Figure 
8) are sequenced. The majority of the insert DNAs derived from these clones contain 
a single recognition site for the probing enzyme. For each unique candidate ESP, two 

IS specific PCR primers, flanking the restriction site, are designed. 

In addition, the sequrace of a limited set of probes that yielded invariant 
hybridization signals is also determined. PCR primers targeting these monomoiphic 
sequences are included as references; they are used to calibrate the hybridization 
signals. 

20 Validation of the candidate ESPs and fabrication of com micnham^s. 

The candidate ESPs, identified under stq> 1, are subjected to a confirmatory 
experiment using the format-m approach. First, four pre-amplification reactions are 
perfonned with a single primer pair and using the Pstl-BM fragments, undigested or 
digested with either one of the three probing enzymes, as template material. These 

2S amplification reactions reduce the complexity of the DNA under study by more than 
two orders of magnitude while at the same time generating a large enough amount of 
material for the subsequent multiplex marker-specific PCRs. The pre-amplifications 
are then used for the PCR rescue of each of the characterized candidate ESPs using 
dedicated primer couples [refer to Wang, D.G.etaL, Science 280:1077-1082 (1998)]. 

30 Particular sets of the ESP-specific primers that amplify the same type of ESP (i.e. 
ESPs for one particular probing enzyme) are combined in a single reaction, together 
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witli the appropriate pre-ampMcation material as template. One of the ESP-specific 
primers is dther Cy3- or CyS-labded; the other lemained unlabeled. The Cy3-primers 
aie used for the multq)lex amplification of the DNA that had previously been digested 
with a probing enzyme, whereas the Cy5-primers are used with undigested control 
DNA. The PCR products from the various multiplex reactions performed on both 
digested and undigested DNA were pooled together to obtain a single hybridization 
mixture per starting DNA. The B73 and Mo 17 derived material was analyzed in 
parallel experiments. The set of ESP-speciiic unlabeled PCR primers served as 
hybridization probes and was arrayed in the same way as amplification products. 
Conditions used are similar to those previously described for hybridization against 
oligonucleotide probes and are readily determined by one of ordinary skill in the art. 

Direct comparison of the nonoialized Cy3 and CyS hybridization signals 
allows determination of the allelic state of the endonuclease target site in B73 versus 
Mol7. Primer pairs that do not allow unambiguous allele calling or that do not 
confirm the candidate ESPs identified with Psd-Bfal sampling (refer to step 1), are not 
retained for further work. 

Step 3. Genetic analysis of a B73/Mol7 recnmhitiMt inbred pop ulation and of 

sepepating pnpiilatioiig 
Genetic analysis of a B73/Mol7 inbred population. A collection of 
recombinant inbred lines derived from a cross between B73 and Mol7 is publicly 
available and provides a most usefiil set of lines for verifying and mapping the 
collection of ESP markers. The advantage of recombinant inbred lines over segregating 
populations is that each inbrsd Ime contains a different set of homozygous chromosome 
segments derived from either parent line. Consequently each ESP will be scored as 
either present or absait. Pnq)aration of the sample DNAs and hybridization against the 
arrayed probes are performed as described under step 2. The experiment will, in the 
first place, allow the testing of selected ESPs in over 100 measurements; the results 
will result in the development of a second generation system that will only detect the 
most consistent ESPs. In addition, the linkage analysis of the segregation data will 
allow the construction of a fine genetic map of the markers. Finally, based on the 
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mapping data, an oideied ESP micro-aiiay is dsveloped for com. 

Genetic analysis of segregating populations. While isolated from two 
inbred lines, it is anticipated that the above-mentioned ordered ESP micro-arrays will 
detect sufficient genetic polymorphism in other com lines to be useful for maiker 
5 assisted selection. To demonstrate the applicability, one could either chose a 
segregating F2 population or a back-cross population. Sample preparations and 
hybridizations are again performed as described under step 2. In this exp&nmeoty the 
ESP maricers must be scored quantitatively so as to dififerentiate between heterozygosity 
and homozygosity. Because only the most consistent marters are retained, a two-fold 

10 difference in signal intensity is easily monitored. The approach used consists of 
normalizing the hybridization signal intensities and then applying a mixture model 
analysis on the nomnalized data. This statistical approach consists of determining 
whether the relative signal intensities can be grouped into three discrete classes, 
corresponding to respectively homozygous present, heterozygous and homozygous 

IS absent. ESP markers that do not fulfill this criterion should be eliminated from the 
analysis. 

Examples 

TTiimflii ru>nctic Analysis Using the Format-I RAA 
20 TMs example illustrates the application of the metiiod of the invration 

for genome-wide genetic analysis in humans. Human is an example of a high 
complexity genome (size ''3,000 Mb) combined with a very low level of genetic 
variability. Single nucleotide differences between pairs of allelic sequences from 
different individuals occur approximately once in eveiy 1000 basq>airs; in the 
25 peculation at large, the fiequency may be in the order of 1 :300. As with Arabidqpsis, 
such a low frequency necessitates the use of a selection procedure for the 
isolation/CTrichment of the rare ESP-haiboring fragments. In this example a batch-wise 
hybridization is used to accomplish this. 

Based on the known mutation frequencies, it can be estimated tiuit the 
30 ESP fiequ^y for a tetracutter-piobing enzyme is in the order of 1 in 125 recognition 
sites. This low levd of g^ietic variation, in combination with the s^itivity of micro- 
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array hybridization, limits the number of ESPs that can be detected in a single assay 
(typically ranging from a few hundred to one thousand, a few thousand at the most). 
These limitations can, to a certain extent, be overcome by choosing probing enzymes 
that recognize tetranucleotide sites containing a CpG dinucleotide. Indeed, it is well 

S documented that a substantial fraction (> 23 %) of the nucleotide substitutions in the 
human genome result from C T transitions in CpG dinucleotides. Such CpG 
dinucleotides r^resent mutational hotspots in vertebrates because a laige fraction of 
die cytosines are methylated and subsequently mutate to thymine by deamination. It is 
estimated that the mutadon frequency of m^ylated cytosines is 6 to 8-fold higher than 

10 average. Hrace probing enzymes that cleave CpG-containing recognition sites will 
yield ESPs at correspondingly higher frequencies, estimated at "5%. However, tiie 
adverse consequence of the high mutation rate is that CpG is relatively rare in 
mammalian DNA, occurring with a frequency of 1 in 100 nucleotides [Wang, D. G. 
et al, Science 280:1077-1082 (1998)] instead of 1 in 16. Likewise the frequency of 

IS CpG-containing tetranucleotide sites is 1 in '1600 instead of 1 in 256 bases. To 
compensate for this, a probe endonuclease reagent can be used, comprising of two or 
more of the following complmientaiy restriction enzymes: TaqI (TCGA), Mspl 
(CCGG), Maell (ACGI), andHinPI or Hhal (GCGQ. It should be noted however that 
cleavage by MaeH as well as the isosctuzomers HinPI and Hhal is blocked by 

20 methyladon of the cytosine residue (C^ within the CpG dinucleotide* These enzymes 
will thus only cleave at a fraction of their sites, namely the non-methylated sites. 
Analysis of the large amount of publicly accessible human genomic DNA sequence 
shows that the cocktail of the 4 enzymes will cleave once in every 400 bp on average. 
The total number of sites in the genome is thus in the order of 1.5 million. Assuming 

25 that the ESP frequency is S%, the enzyme cocktail has the potential of detecting 
"ZlSSXXi ESPs. In addition to using combinations of restriction endonucleasesi one 
may also use reaction conditions that decrease the cleavage specificity. Such a strategy 
has been sailed to obtain a restriction endonuclease reagent, designated CGasel, that 
is capable of cleaving DNA at CpG dinucleotides [Mead D. et al. , WO 94/21663]. 

3 0 This CGasel lestricdon mdonudease reagent may be paitkrularly useful for the analysis 
of human polymorphisms usmg the methods of the present inventicm. 
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Hie example described below illustrates the approach in a limited scale 
assay, which characterizes the human ESPs within CpG-containing tetranucleotide 
recognition sites using the sampling enzyme combination Pad - Bfal. The rare cutter 
Padl is estimated to have only about 50,000 cleavage sites in the human genome; the 
frequent cutter BM wiU gmerate two fnigmmts p^ 

will, therefore, create a moderately complex s^ of 100,000 Pacl-Blal target fragments. 
This fragment set cq>tures a sizable number of C^-containing restriction sites, 
estimated in the order of 40,000. Assuming a S% ESP frequency, the number of 
detectable ESPs is in the order of 2000. It should be stressed that many different 
sampling enzyme combinations can be used and that thus a substantial fraction of the 
^"375,000 ESPs located within NCGN-type restriction sites can be monitored. 

The procedure outlined in this example comprises the following steps: 

(1) Development of a set of candidate Pacl-Bfal ESP fragments 

(2) Genetic analysis of humans using ESP probe fragments 

Step 1 Develnpment nf a set of Pac T-Bfial pmhe fragment 
A mixture of sainple firagmeots, derived from various individuals in the 
population, can be divided in three classes with respect to sites for the probing enzyme: 
monomoxphic fragments that are devoid of a cleavage site, fragments that are always 
cleaved, and ftagments that carry one polymoiphic recognition site. Fragments that are 
digested will be referred to as S+ fragments and fragments lacking the site as S- 
fragments. Polymoiphic ESP fragmrats will thus be the only fragments present in both 
the S+ and S- population of sampling fragments. This forms the basis for their 
selection by batch-wise hybridization: only ESP ftagments are capable of annealing 
when mixing the S+ and S- fragment collections. The hybridization-selection can be 
performed in two different, recq>rocal ways: either the S+ fragments can be used to 
retrieve the matching S- fragments, or S- fragments are used to collect the 
complementary S+ sampling fragments. In one approach, the selected candidate ESP 
fragments may be isolated by cloning, arrayed, and subsequentty validated by testing 
various sample DNAs (e.g. the various sample DNAs used as starting material for the 
hybridization-selection). Candidate ESP probe fragmrats that appear to detect 
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mQQQmQiphic sample fragments may either be removed from the array or retained as 
control demrats on the array. An alternative approach consists of performing the two 
reciprocal hybridization-selections, cloning the selected fragments, and identification 
of ESPs by means of matching S+ and S- fragments. The latter strategy is outlined 
5 below. 

(i) Preparation ofS+ and 5- fragments The preferred starting 
material is an equimolar mixtuie of genomic DNA from a number of representative 
individuals. Such individuals (ranging from S to SO) may be chosen from various 
CEPH (Centre d'Btude du Polymorphisme Humain) pedigrees [Wang, D. G. et al., 

10 Sdence 280:1077-1082 (1998)]. Following cleavage of the DNA mixture with the 
PacI/Bfal-combinaiion of sampling enzymes, iqiprqpiiate oligonucleotide adapts as 
described above are ligated tote fragment ends. This template DNA is divided in two 
aliquots and treated sqiarately to pTcpax^ respectively the S+ and S- fragment mix. To 
pr^are the S- fragment mix, the target DNA fragments are cleaved with the probing 

IS enzyme and then amplified. This will result in a mixture of fragments that do not 
contain sites for the probing enzyme. Furthermore, the S- fragment mixture may be 
prepared by using one biotinylated primer, such that the resulting PGR product can be 
captured onto a solid substrate, such as magnrtic beads conjugated with strqrtavidin. 
S+ fiagmeots are prq>ared by (1) amplifying the mixture of Pacl-Bfal fragmrats, (2) 

20 digesting the PGR product with one of the four NCGN-recognizing enzymes, (3) 
ligating appropriate adapters to the ends generated by the probing enzyme (see EP 0 
534 858, incorporated herein by reference), and (4) re-amplification of the resulting 
material using one prim^ that recognizes the probe enzyme zdapbsr and one primer that 
recognizes one specific sampling enzyme adapter. Similar to the S- fragments, the 

25 amplification reaction can be performed making use of a biotinylated primer that 
matches the probe enzyme adaptor such that the S+ fragment mixture can be 
inunobilized. 

Two alternative pairs of Pad- and Bfal-adaptors, as well as 
corresponding non-selective primers are used; e.g. set I is used for the amplification 
30 of the S- fragments and set n for the preparation of S+ fragments: 
Set I 
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5' -CTCJGTAGACTGCGTACCCAT (SEQ ID NO: 17) 
3' -CATCTGACGCATGGG (SEQ ID NO: 18) 

5 ' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 
5 3 ' -TACTCAGGACTCAT (SEQ ID NO: 4) 

5 ' -GACTGCGTACCCATTA (SEQ ID NO: 19) 

5 ' -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 

5 ' -GAGCATCTQACGCATGGGAT (SEQ ID NO: 20) 
3 ' -GTAGACTGCGTACCC (SEQ ID NO: 21) 

15 5' -CTGCTACTCAGGACTG (SEQ ID NO: 13) 

3 • -ATQAGTCCTGACAT (SEQ ID NO: 14) 

5' -CT6ACGCATGGGATTA (SEQ ID NO: 22) 

20 5' -CTACTCAGGACTGTAG (SEQ ID NO: 16) 

The adaptor ligated to the ends generated by the NCGN-cleaving probing mzyme and 
the conesponding amplification primer have the following structures: 

25 5 ' -GTCCTCATCGAGCATG (SEQ ID NO: 23) 

3 ' -AGTAGCTCGTACX5C (SEQ ID NO: 24) 

5' -CCTCATCGAGCATGCG (SEQ ID NO: 25) 

30 (ii) Hybridization-selection step(s) The S- fragment mix is 

hybridized to the biotinylated S+ fragments. Following hybridization, the biotinylated 
products are captured onto streptavidin-coated magnetic beads. Hie beads are 
rqieatedly washed to remove all unhybridized fcagments and tfaoeafier the hybridized 
S- fragmmts are duted. Hiese axe then reanqdified with the PacI and BM pmasis and 

35 the hybridization-selection procedure is repeated at least once. Finally die amplified 



10 

Setn 
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fragments axe cloned in an s^iopriate vector and a series of around 2,000 inserts are 
sequenced. To select a set of S+ fragments, this procedure is repeated in reverse using 
this time biotinylated S- fragment. Upon comparison of the S+ and S- sequences ESP 
fragments are readily identified as fragments having partially overlapping sequences 
S and in which the S- fragment sequence shows a mutated NCGN restriction site at the 
internal boundary of the overlap. In this way, >S00 ESPs are readily characterized. 

Step 2. Genetic analysis of humans using ESP probe fra gments 

The sequence-verified ESP firagments are spotted on micro-arrays for 

10 genetic analysis of human sample DNA. For the preparation of this sample DNA, a pair 
of adaptors/primers is used that differs from those attached to the arrayed S- or S+ set of 
ESP fragments. From each individual, an undigested control sample and a probe enzyme 
digested test sample are prepared. These samples are labeled with Cy3 and Cy5, mixed 
and hybridized to the micro-arrays as described before. Alternatively, the hybridization 

15 mixture may be composed of differentially labeled test DNA and previously genotyped 
control DNA, both digested with the probing endonuclease. In both cases, the Cy3 
(test/digested sample) and CyS (control/undigested DNA) signal intensities are normalized 
using a number of monomorphic control probes. The ratio of these normalized Cy3/Cy5 
^gnals for each of the ESP probes, allows accurate detennination of the allelic state of the 

20 sample at each polymorphic site (homozygous S+/S+, homozygous S-/S-, hetero^gous 
S+/S). 

The micro-array hybridization «periment may in the first place be 
performed with the sample DNAs, deriving from a collection of individuals, from which 
the ESP probe fi-agments were isolated. Such an e>q)eriment will, in the first place, confirm 
25 the polymorphic nature of the selected probe fragments and allow their testing in a 
multitude of measurements. The data will also yield information on the allele frequencies 
among an appreciable number of chromosomes. 
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Example 4 

Human genetic analysis using format^n W AA 
As described for com in Example 2, the format-I ESP assay for human 
genetic analysis may be converted to a format-n or a fonnat-m assay. Based on the 
sequence of the selected and experimentally validated ESP fragments, it is indeed 
possible to design a pair of dedicated, Le. ESP-speciiic, PGR primers. Such primers 
can be combined in a number of parallel multiplex reactions, which aie in turn 
combined to obtain the sample DNA [Wang, D. G. et al.. Science 280: 1077-1082 
(1998)]. This sample DNA is hybridized against a micro-array of spotted S+ ESP 
fragments (see to Example 3). The e^riment is set up such that the fluoxescenfly 
labded ESP-qpedfic primer and the S + sequmces are located on opposite sides of the 
polymoiphic site. Altematively, the unlabeled ESP-specific amplification primers may 
be arrayed as hybridization probes. The development of a fonnat-n or fonnat-m assay 
need not be preceded by the identification of ESP fragments (using one of the methods 
described in the previous examples). In the present example, we describe the 
develppm«t of an RAA assay based on the sequrace of previously discovered SNPs. 

Oose inspection of tiie known SNPs reveals that a Mgntfi«int percentage 
of them are associated with both the loss and gain of a restriction recognition site, i.e. 
each of two allelic sequences is associated with a different restriction lecognition site. 
The single nucleotide substitution may inter-convert recognition sequences that are 
idoitical excqrt for one nucleotide [e.g. Pld (GACTC) and Hgal (GACGC), Hgal and 
SfaNI (GATGC), SfaNI and Bbvl (GCTGC)]. Alternatively, tiie alleHc recognition 
sites may be paitiaUy overlapping [e.g. MaeH (ACGTg) and NlalE (aCATG); in tfie 
latter case the inter-conversion depends on the nature of the upstream or downstream 
sequraces). Such mutually exclusive restriction site allelism offers a distinct advants^e. 
Hie RAA technique will normally only detect the allele tiiat is devoid of a recognition 
site for the probing enzyme; therefore, determination of the zygosity requires careful 
calibndon of the signal against tiiat (A>seived with undiges^ 
allele is associated with tiie presence/absence of a restriction site, two parallel RAA- 
assays can be perfonned, each involving digestion with one of tiie alternative enzymes. 
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With such an assay, both alleles can be positively identified and the zygosity is readily 
determined. The two parallel assays are best performed in a two-color mode; one of 
the primers is differentially labeled (e.g. with Cy3 and Cy5 as described previously) 
such that the amplification reactions can be mixed and hybridized against a single amy 
S of probes. 

We have systematically explored the SNP database of the Whitehead 
Institute for mutational dianges that promote restriction site inter-conversions and have 
calculated their occurrence frequency. Two SNP-associated recognition site inter- 
conversions were found to occur at high frequency: Maell -> Nlam and Hgal -> 

10 S&NI. In both cases the mutational changes convertiog one site into another are C->T 
(or G->A) transitions occurring in CpG dinucleotides. This finding is entirely consistent 
with the fact that this type of mutation occurs with a 6-8 times higher fi-equency than other 
nucleotide substitutions. Based on the number of SNPs found in the Whitehead database, 
we estimate the total number of SNPs in the human genome for the enzyme pairs 

15 Maell/Nlain and Hgal/SfaNI at respectively 30,000 and 15,000, These numbers are 
presumably somewhat overestimated since both Maell and Hgal are susceptible to CpG 
methylation. Consequendy the inter-conversion can only be measured at the non- 
methylated sites. Therefore, in practice, RAA assays designed on the basis of sequence 
data should be validated on a number of test samples. Assays in which no cleavage takes 

20 place at the CpG-contaming site in none of the individuals tested, should be eliminated 
fi'om the RAA bi-allelic marker systems. 

The foregoing ^camples are illustrative of the invention and are not mtended to be limit 
the scope of the invention as set out in the claims. All of the references cited herein are 
incorporated by reference. 

25 
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WE CLAIM: 

1. A method for detecting an endonuclease site polymoiphism (ESP) in 
DNA, the method comprising: 

S (a) isolating sample DNA; 

(b) deriving set of concomitantly amplifiabletaig^ DNA 
from the sample DNA; 

(c) treating the target DNA fragments obtained in step (b) with a 
probe lestriction endonuclease reagent; 

10 (d) amplifying the probe restriction endonuclease reagent treated 

taiget DNA fragments of step(c); 

(e) analyzing the DNA of step (d) to determine which taiget 

fragmrats are amplified and/or which taiget fragments are not amplified; and wherein 

target DNA fragments which are amplified lack a recognition site for the probe 
IS restriction endonuclease reagent and taiget fragments having a recognition site for the 

probe restriction endonuclease reagrat are not amplified. 

2. The method of claim 1 the concomitantiy amplifiable taiget DNA 
fragment of step (b) are derived by treatment of the sample DNA with a sampling 

20 restriction endonuclease reagent. 

3. The method of claim 2 wherein the concomitantiy amplifiable DNA 
fragments of step (b) are derived from sample DNA by treatment of the sample DNA 
with a first and a second restriction endonuclease reagrat. 

25 

4. The method of claim 3 wherein said first restriction endonuclease 
reagent has a recognition sequence of sbc or more nucleotides and the second restriction 
endonuclease reagents has a recognition sequence of four or fewer nucleotides. 

30 5. The method of claim 3 or 4 wherein said concomitantiy amplifiable 

taigrt DNA fragments are cferived by step wise treatment of said sample DNA with the 



wo 00/28081 

.47. 

first and the second lestiiction endonuclease reagents. 



PCT/IB99/01958 



6. The method of claim 1 further comprising prq)aring of PGR primers 
which flank the endonuclease site polymorphism (ESP) for use in amplifying said 

S concomitantly amplifiable target DNA fragments. 

7. The method of claims 1, 2, 3, and 4 wherein the concomitantly 
amplifiable DNA fragments are modified by ligation of adapters to both termini of said 
fragments, and wherein said adaptors are capable of serving as primers for 

10 amplification. 

8. The method of claim S wherem the concomitantly amplifiable DNA 
fragmrats are modified by ligation of adapters to both termini of said fragments, and 
wherein said adaptors are capable of serving as primers for amplification. 

15 

9. The method of claim 1 wherein the probe restriction endonuclease 
reagent of step (c) has a recognition sequence comprising six or more nucleotides. 

10. The method of claim 1 wherein the probe restriction endonuclease 
20 reagent of step (c) has a recognition sequence comprising four or more nucleotides. 

11. The method according to claim 1 wherein the probe restriction 
endonuclease of step (c) has a recognition sequence of two nucleotides. 

25 12. The m^od according to claim 1 wherein the order of the steps (b) and 

(c) are reversed or carded out sunultaneously. 

13. The method according to claim 1 wherein said radonuclease site 
polymorphism is an alteration in a concomitantly amplifiable target fitagmrat giving 
30 rise to a nucleotide sequence that is recognized and cut by the probe restriction 
endonuclease reagent . 
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14. Hie method of claim 1 wherein said site polymorphism is an alteration 
in the nucleotide sequence of a concomitantiy ampli&ble target fragment which 
eliminates a recognition sequence for said probe restriction endonuclease reagent. 

S IS. The method of claims 1, 2, 3 and 4 wherein said concomitantiy 

amplifiable DNA fra^ents are amplified by a polymerase chain reaction. 

16. The method of claim 5 wherein said concomitantiy amplifiable DNA 
fragments are amplified by a polymerase chain reaction. 

10 

17. The method of claim 1 wherein amplified target fiagments are 
identified by their ability to hybridize to cognate probe DNA fragments. 

18. A method for obtaining probe DNA fragments for use in detecting 
IS endonuclease site polymorphisms, the method comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantiy amplifiable targtf DNA fiagmmts 
from the sample DNA; 

(c) sdecting from tiie target DNA ficagmmts, probe DNA fragments 
20 having an endonuclease site polymorphism (ESPs) for the piobe restriction 

endonuclease. 

19. The mediod of claim 17 wherein said probe DNA fragments are derived 
by digestion of sample DNA with one or more sampling restriction endonuclease 

2S reagents. 

20. The method of claim 18 wherein probe DNA fragments are derived by 
digestion of a pool of sample DNAs obtained from one or more individuals of a 
species. 

30 

21. The method of claim 18 wher^ tiie probe DNA fragmmts are derived 
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by digestion of a pool of sample DNAs obtained from 10 or more individuals of a 
species. 



22. The method of claim 18 wherein the probe DNA fragments derived by 
5 digestion of a pool of sanaple DNAs obtained from a pool of 50 or more individuals of 

species. 

23. The method of any one of claims 19-21 wherein said species is selected 
from the group consistuig of piocaiyotic species and eucaiyotic species. 

10 

24. A method for obtaining probe DNA fragments for use in detectmg 
endonuclease site polymorphisms (ESP) comprising prquiring synthetic 
oligonucleotides based on the nucleotide sequrace of ampliiiable target DNA fragments 
containing mdonuclease site polymoiphism(s). 

15 

25. A method for producing a microanay of probe DNA the method 
comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fragments 
20 from the sample DNA; 

(c) selecting probe DNA fragments having restriction endonudease ate 
polymorphisms (ESPs) from the sample restriction endonuclease treated target DNA 
fragments of step (b); and 

(d) arraying the probe DNA fragments obtained m step (c) on a solid 
25 substrate in a predefined region by attaching the fragments to the substrate. 

26. The method of claim 24 wherein the DNA fragments of step (b) are 
obtained by treating sample DNA with one or more sample restriction endonuclease 
reagents. 

30 



27. 



ThemedK)dofclaim24wheremthe said probe DNA fragments of step 
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(d) are synthetic oligonucleotides which correspond to the concomitantly amplifiable 
target DNA fragments derivable from said sample DNA and containing an 
endonuclease site polymoiphism (ESP). 

28. The method of claim 25, 26 or 27 wherein the solid support is selected 
from a group consisting of a planar solid support, a bead, a sphere and a polyhedron. 

29. The method of daim 25 wherein the microanay comprises at least 2,000 
probe fragments. 

30. The method of claim 26 wherein the nucroanay comprises at least 2,000 
sythetic ologonucleotides. 

31. The method of claim 27 wherein the microanay comprises at least 2,000 
probe fragments. 

32. The method of daim 28 wherdn the microanay conq)rises at least 2,000 
probe fragments. 

33. The method of claim 25 wherein the microanay comprises at least 
20,000 probe fragments. 

34. The method of claim 26 wherein the microanay comprises at least 
20,000 sythetic ologonucleotides. 

35. The method of claim 27 wherein the microarray comprises at least 
20,000 probe fragments. 



36. The method of claim 28 wherein the microanay comprises at least 
20,000 probe fragmrats. 
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SEQUENCE LISTING 



<110> METHEXIS N.V. 

<120> RESTRICTED AMPLICON ANALYSIS 

<130> 29314/34158A 

<140> 

<150> 60/107,293 
<151> 1998-11-09 

<160> 28 

<170> Patentin Ver. 2.0 

<210> 1 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 1 

ctcgtagact gcgtacc 17 

<210> 2 
<211> 18 
<212> DNA 

<213> Artificial Secpience 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5' to 3' 
direction. As presented in the specification the 
sequence reads in the 3 * to 5 ' direction . 

<400> 2 

aattggtacg cagtctac 18 

<210> 3 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 3 

gacgatgagt cctgag 



16 
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<210> 4 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5' direction. 

<400> 4 

tactcaggac teat 14 

<210> 5 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<22l> misc_feature 
<222> (18) 

<223> At position 18 N s a, G, or T 
<400> 5 

gactgcgtac caattcnn 18 

<210> 6 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N « A, C, O, or T 
<220> 

<221> misc_feature 
<222> (18) 

<223> At position 18 N » A, C, G, or T 
<220> 

<221> misc feature 
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<222> (19) 

<223> At position 19 N s A, C, G, or T 
<400> 6 

gatgagtcct gagtagnnn 19 

<210> 7 
<211> 21 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 7 

ctcgtagact gcgtacatgc a 21 

<210> 8 
<211> 14 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5* direction. 

<400> 8 

tgtacgcagt ctac 14 

<210> 9 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 9 

gactgcgtac atgcag 16 

<210> 10 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 10 

gatgagtcct gagtag 16 



<210> 11 
<211> 21 
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<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sec[uence : , primer 
<400> 11 

gagcatctga cgcatgttgc a 21 

<210> 12 
<2ll> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 » 
direction. As presented in the specification the 
sequence reads in the 3' to 5* direction. 



<4C0> 12 

acatgcgtca gatg 14 

<210> 13 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 



<400> 13 

ctgctactca ggactg 16 

<210> 14 
<211> 14 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5* to 3' 
direction. As presented in the specification the 
sequence reads in the 3* to 5» direction. 



<400> 14 

tacagtcctg agta 14 

<210> 15 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 15 

ctgacgcatg ttgcag 16 

<210> 16 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 16 

ctactcagga ctgtag 16 

<210> 17 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 17 

ctcgtagact gcgtacccat 20 

<210> 18 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5' to 3* 
direction. As presented in the specification the 
sequence reads in the 3» to 5* direction. 

<400> 18 

gggtacgcag tctac 15 

<210> 19 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 19 

gactgcgtac ccatta 16 



<210> 20 
<211> 20 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 20 

gagcatctga cgcatgggat 

<210> 21 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ■ to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5' direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 21 

cccatgcgtc agatg 

<210> 22 
<211> 16 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 22 

ctgacgcatg ggatta 

<210> 23 
<211> 16 
<212> DNA 

<213> Artificial Seq^ience 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 23 

gtcctcatcg agcatg 

<210> 24 
<211> 14 
<212> DNA 

<213> Artificial Seq[uence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5* to 3' 
direction. As presented in the specification the 
sequence reads in the 3' to 5* direction. 
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<220> 

<223> Description of Artificial Sequence: primer 



<400> 24 
cgcatgctcg atga 



14 



<210> 25 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 25 

cctcatcgag catgcg 16 

<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5* to 3* 
direction. As presented in the specification the 
sequence reads in the 3* to 5' direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 27 

aattggatgc gtcagatg 18 

<210> 28 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 26 

gagcatctga cgcatcc 



17 



<400> 28 

ctgacgcatc caattc 



16 
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RESTRICTED AMPLICON ANALYSIS 



Field of the Inventinn 

The present invention generally provides a method which facilitates the 
detection of polymorphisms (or mutations). ITie method is directed to the analysis of 
so-called endonuclease site polymorphisms (ESPs) that result in the gain or loss of a 
restriction endonuclease site. In essence, the ESP is probed with the restriction 
endonuclease reagmt prior to amplification, whereby amplification is prevented and 
consequentiy no signal is observed when cleavage takes place. Unambiguous allele 
calling is performed by comparing the signals obtained with and without cleavage with 
the restriction endonuclease reagent. The method is particularly useful for multiplex 
genotyping, involving the parallel analysis of large numbers of single nucleotide 
polymorphisms. Preferred methods for detecting the amplicons involve hybridization 
to an arrayed or otherwise identifiable set of cognate probe fragments or 
oligonucleotides. 

Background of the Inventinn 

Molecular s^roaches for genetic analyses trace the nucleotide sequence 
variation that occurs naturally and randomly in the genomes of all living species. 
Knowledge of the DNA polymorphisms among individuals and between populations 
is important in understanding the complex links between genotypic and phenotypic 
variation. In the absence of complete data about sequence variation, one relies on the 
ability to identify 'nearby' markers tiiat allow to infer tiie location of certain relevant 
loci or causal sequence variations. The informativeness of tiie marker depends on the 
magnitude of the linkage disequilibrium. Markers can be used in linkage studies to 
search for candidate genes and in association studies to identify the functional allelic 
variation on candidate genes fliat influence inter-individual variation. 

The vast majority of sequence variation consists of nucleotide 
substitutions, often referred to as single nucleotide polymorphism's (SNPs), resulting 
from mutations that have accumulated during evolution. Most of these nucleotide 
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changes are genetically silent; i.e., they have no measurable biological effect, but 
provide an immense reservoir of variation in DNA structure. Most methods for genetic 
analysis used today rely on the detection of nucleotide sequence variation which can 
be measured by DNA fragment analysis using electrophoretic separation, in which 
DNA fragments are fractionated based on size or conformation. Occasionally the 
nucleotide sequence variation will affect either the presence of the DNA fragment or 
its mobility. In this way the primary nucleotide sequence variation will give rise to 
easily detectable DNA fragment polymorphism. Since polymorphic DNA ftagmMts 
are derived from predse locations on the organism's genome, they can serve as reliable 
genetic markers, or landmarks to identify and locate genes. 

A host of assays to detect DNA polymorphisms, and SNPs in particular, 
have been developed. In some of these assays (e.g. , RFLP [Botstein, D. , White, R.L. , 
Skohiich, M., Davis, R.W., Am. /. Hum. Genet. 32:314-331 (1998)], CAPS 
[Konieczny, A. Ausubel, J.F., PkmtJ. 4:403-410 (1993)], dCAPS [Neff, M.M. Neff, 
J.D., Chory, J., Pepper, A.E., The Plant Journal 14:387-392 (1998)], PIRA 
[Steinbom, R., MuUer, M., Brem, G., Biochim. Biophys. Acta 1397:295-304 (1998)]), 
restriction enzymes are used to detect polymorphic nucleotide sequences that affect 
cleavage. The specificity of restriction enzymes is such that they exhibit a unique 
sensitivity to detect single nucleotide differences occurring in their recognition sites. 
The princ^ strengths of restriction ^izym&-based genetic analyses are the ease of use 
and the robustness of the assays. In the majority of the cases, the restriction site 
polymorphism is used to detect known, previously identified SNPs and the assay 
consists of any electrophoretical fragment analysis. In one report, the allelic variation 
is detected in a solid-phase EUSA-type setting [Truett, G.E. , Walker, J. A., Wilson, 
I.B., Redmann, S.M. Jr., IWley, R,T., Eckardt, G.R., Plastow, G., Mamm. Genome 
9:629-632 (1998)]. 

In WO 91/17269, Lemer et al. describe a different method for mapping 
a eukaryotic chromosome by restriction endonuclease mapping of discrete DNA 
sequences which are complementary to a region of a eukaryotic chromosome. 

Vos et al. , Nucl Acids Res. 23:4407^14 (1995) and EP 0 534 858 
describe a technique for DNA fingeiprinting called AFLP which is based on the 
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selective polymerase chain reaction based application of restriction fragments of a 
digest of genomic DNA. The application reaction dqjends on the use of primers that 
extend into restriction fragments amplifying only those fragments in which prior 
extensions match the nucleotide sequence flanking the restriction sites. 

Another method utilizing DNA amplification steps is set out in Williams 
et al., Nucl Adds Res. 18:6531-6535 (1990), who describe a DNA fingeipiintmg 
method termed random amplified polymorphic DNA. 

DNA amplification fingeiprinting was described by Caetano AnoUes in 
Bio/Technology 9:553-557 (1991). Still another fingeiprinting technique called 
arbitrarily primed PGR was described in Welsh et al , NucL Adds Res. 18:7213-7218 
(1990) and Welsh et al , NucL Adds Res. 19:861-866 (1991). 

In WO 94/11530, Cantor et al. describe materials and methods for 
position and sequencing by hybridization. Cantor et al. also describe methods for 
creating assays of DNA probes useful in the practice of their method. 

The major shortcoming of the cunrat methods of genetic analysis is the 
limited resolution of the DNA fragment analysis systems, namely the number of DNA 
fiagmmts tiiat can be sqiaiated in a single assay. Gmerally the firactionation resolution 
ranges from tms to a couple of hundred DNA fragments, at the most. Consequentiy, 
current genetic analysis methods are limited to a few hundred to a thousand genetic 
mariners. While this resolution has been sufficient for analyzing simple genetic txaits 
determined by single genes, the analysis of complex traits, which is now being 
undertaten and which involve gMeral or many different genes, will require the analysis 
of a much larger number of graetic markers. It is anticipated that such studies will 
require from a few thousand to possibly several hundred thousand genetic markers. 
Although this could conceivably be accomplished by performing many parallel assays, 
such scaling up will be cost- and labor prohibitive. 

A technology that has great potential and which is generating widespread 
interest in the so-called micro-array technology (DNA chips). In general, tiiese 
methods are based on measur^ent of the hybridization of DNA sequences in solution 
to probe sequences that are arrayed on a solid suriiace. When assaying nucleotide 
polymorphisms, die detector lelies on the small differences in hybridization efficiency 
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between two different DNA sequences. In one format, fluorescently labeled sample 
DNA is hybridized to dense anays of probe nucleic acids, sequence-specific 
hybridization signal is detected by scanning confocal microscopy, and DNA variants 
scored as (predictable) differences in the hybridization pattern. The micio-amys are 
fabricated either by in-situ light-directed oligonucleotide synthesis [Fodor, S.P. A. et 
al. Science 251: 767 (1991)] or by spotting DNA (off-chip synthesized 
oUgonucleotides or PCR fragments) in an automated procedure. The technology has 
already been demonstrated in the scoring of mutations in mitochondrial DNA [Chee, 
M. et al. Science TJA: 610-614 (1996)], the HIV genome [Lipshutz, R.J. et al., 
Biotechniques 19: 442-447 (1995)], the CFTR cystic fibrosis gene [Cronin, M.T. et 
al.. Human Mut.7: 244-255 (1996)], the BRCAl breast cancer gene (Hacia, G.H. et 
al.. Nat. Genet. 14: 441-447 (1996)] as well as the entire yeast genome [Winzeler, 
E.A. etal.. Science 2Sl:im (1998)]. In comparison with most other assays, micro- 
arrays provide a platform for high-throughput, massively paiaUel polymoiphism 
detection. 

A major disadvantage with the use of microarrays relates to the 
complexity of the hybridization reaction. The detection reUes on the very small 
difference in hybridization of DNA sequences differing by only one nucleotide. In 
general, a set of 4 oligonucleotides, differing only in the identity of the central base, 
is synthesized for each position in the target sequence that has to be interrogated. In 
practice, the number of oligonucleotides needed to correctty genotype one SNP is much 
larger, involving up to 56 diflfereait oligonucleotides spanning the variable base [Wang 
etaL. Science 280: 1077-1082 (1998)]. The degree of redundancy is also dramatic if 
one wants to screen the target DNA for all possible mutations; the design then includes 
overlapping oUgonucleotide-sets that are offset by one base (a process known as tiling). 
It should be noted that die detection of SNPs by hybridization to arrays depends on the 
use of short oligonucleotide probes. With longer probes such as DNA fragments in the 
size range of 50 to 500 base pairs or larger, it is not possible to distinguish the SNP 
alleles. 



wo 00/28081 



PCT/IB99/01958 



Siininiary nf f hft TnvPnf Inn 

The present invention is directed to methods for genotyping 
polymorphisms that result in the gain or loss of an endonuclease cleavage site. Such 
S polymoiphisms are lefened to hereinafter as endonuclease site polymoiphisms (ESPs). 
Polymoiphisms detectable accoiding to the methods of the preset invention include 
single nucleotide polymoiphisms (SNPs). The methods of the preset invention exploit 
the high discriminatory power of restriction enzymes in a "Restricted Amplicon Assay" 
(RAA) which generally comprises the following steps (see Figure 1): 
10 (a) isolating sample DNA; 

(b) derivingia set of target DNA fragments, said set of target 
fragments comprising concomitantly amplifiable target DNA fragments from the 
sample DNA; 

(c) treating the target DNA fragments obtained in step (b) a probe 
1 5 restriction endonuclease reagent; 

(d) amplifying the amplifiable probe restriction endonuclease reagent 
treated target DNA fragments of step(c); and 

(e) analyzing the DNA of step (d) to determine which target 
fiagmrats are amplified and/or which target fiagm^its are not amplified; and wherein 

20 amplified target fragments lack a recognition site for the probe restriction endonuclease 
reagent and target fragments having a recognition site for a probe restriction 
endonuclease reagent are not amplified. 

In one aspect, the preset invention is directed to RAA-methods, which 
comprise the preparation of concomitantly amplifiable DNA segmrats by digestion of 

25 flie starting DNA with one or more restriction endonucleases, collectively referred to 
herein as sampling enzymes. This method is herein referred to as format-I RAA and 
is diagrammed in Figure 2. The digested starting DNA may be further modified at its 
termini by the addition of adapters, which may serve to prime an amplification reaction 
(see Figure 2). Once sample DNA is obtained, it is treated with a different restriction 

30 enzyme, the probing enzyme also referred to as a probe restriction endonuclease 
reagent. A combination of probing and sampling enzymes are chosen such that a 
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substantial fraction of the sample fragments contain a single recognition site for the 
probe endonuclease reagent. In general, probe enzymes used with format-I RAA 
preferably have as a recognition site a nucleotide sequence of less than six nucleotides. 

In another aspect, the present invention is directed to methods for 
format-n RAA for the detection of ESPs, as diagrammed in Figure 3. Format-U RAA 
opiates on the same principal as fonnat-I RAA excq>t that the sample amplicons need 
not be DNA fragments, but are rather defined regions of a genome amplifiable with 
specific prim^ pairs. Hie amplicons of ttie foimat-n RAA are identified on the basis 
of sequence data; e.g. the sequence of ESP-containing restriction fragments identified 
usmg format-I RAA mediod or otherwise known SNPs affecting CTdonuclease cleavage 
sites. In fonnat-n RAA, the test DNA to be analyTed is treated with a probe restriction 
endonuclease reagent, followed by the concomitant amplification of regions of the 
treated DNA (amplicons) using predetermined primers using, for example, the 
polymerase chain reaction as described herein. The analysis of the amplification 
products tiien proceeds as described in the format-I RAA mefliods described herein. 
As with format-I RAA, an ESP is genotyped by the presCTce or absence of a 
recognition site for the probe restriction endonuclease reagent. 

In yet another aspect, the present invention is directed to methods for 
format-in RAA. In essence, format-m RAA consists of a combination of the format-I 
and format-n ^roaches. One of such combinations is diagrammed in Figure 4. Test 
DNA, digested or not with a probe endonuclease reagent, is sampled with a pair of 
endonuclease reagMts and the resultmg fragments are co- as described in tiie format-I 
assay amplified (this stq) is referred to as the pre-amplification step). These pre- 
amplification mixtures are, in turn, used as templates for a format-n type of PGR 
reaction in which multiple ESP-containing regions are selectively co-amplified using 
specific primer sets. The analysis of the amplification products tiien prx)ceed5 as 
described before. The advantages of fonnat-m RAA are that the stepwise 
amplification facilitates the multiplex PGR of the ESP-s5)ecific amplicons and lowers 
the amount of starting material required to interrogate all the ESPs. 

Arrays, or microarrays of probe DNA wherein the probe DNAs are 
useful in the d^tion of ESPs are also encompassed by the present invention. 
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Infonnative probe DNAs are prepared and identified as described in detail below and 
are tlien attached to a substrate for use in the hybridization reactions with concomitantiy 
amplifiable DNA after treatment with a probe restriction endonuclease reagent and 
subsequent amplification. 

Since the method of the invention is based on the detection of a 
particular kind of DNA polymoiphism, which occurs in DNA of any oiganism, the 
invffltion will be univ^^y applicable. The methods of the present invention may be 
used to genotype ESPs in a wide variety of organisms from prokaryotic organisms, 
such as bacteria, through complex eukaryotic organisms, viruses, or any organism 
having a genome however simple or complex. The methods may also be used for the 
analysis of extrachromosomal DNA, the DNA found in certain cellular organelles, 
cDNA prq)arations, or DNA libraries, such as yeast artificial chromosome libraries 
and others. Furthermore, based on the large body of DNA sequence data at hand, it 
is predicted that the genomes of higher organisms carry several hundreds of thousands 
of such DNA polymoiphism. Consequenfly , the new metiiod is capable of diagnosing 
the inmiense number of genetic markers that are needed to unravel complex traits. The 
method is of tremendous value for high throughput genetic analysis in the emecging 
field of pharmacogenonucs. Similarly, the method has great potential in the field of 
animal and plant breeding, where high resolution genetic analysis will be needed to 
identify the genes involved in quantitative agronomic traits. 

Various aspects of the present invention are described in more detail 
below {see Detailed Descrq)tion of the Invention). Variations in each of these aspects 
will be readily appreciated by one of ordinary skill in the art and one witii the scope 
of the invention. 

Brief Descrintion of the Drawingfi 
Figure 1 dqjicts the general concept of the Restricted Amplicon Assay. 
The vertical arrows indicate die positions of tiie ESPs. The open ciides denote tfie 
probing enzyme sites that are present, while the closed circles denote the mutated sites. 
The first stq) involves cleavage of the test DNA with die probing endonuclease. The 
second step involves PGR amplification of DNA segments comprising the ESPs. The 
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small horizontal arrows denote the PGR primers flanking the ESPs. When cleavage 
occurs the DNA is cut between the PGR primers, preventing the subsequent 
amplification of the DNA segm^t comprising those ESPs. Only those DNA segments 
that were not cleaved are amplified. The final step comprises assaying the amplicons. 

Figure 2: Diagrammed representation of fonnat-I RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
sampling enzyme cleavage step. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the adapter ligation step. The open 
lines rq)resent the adapters ligated to the ends of the sampled restriction fragments. 
Step 3 represents the probing enzyme cleavage step and the small horizontal arrows 
denote the PGR primers matching the adapter sequences. Step 4 represents the PGR 
amplification step in which only the sample fragments that are not cleaved by the 
probing enzyme are amplified. Hie crossed circles represent the fragments that are not 
amplified. 

Figure 3: Diagrammed representation of format-II RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
probing enzyme cleavage step. The dotted boxes denote the DNA sequences flankmg 
the ESP sites. Step 2 represents tiie PGR primer design. The small horizontal arrows 
denote the PGR primers flanking the ESPs Step 3 rq)resents die PGR amplification step 
in which only flie sample fragments that are not cleaved by the probing enzyme are 
amplified. The crossed circles represent the fragments that are not amplified. 

Figure 4: Diagrammed rq>resentation of format-in RAA. The vertical 
arrows indicate the positions of die ESPs, with die open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 lepiesents the 
sampling enzyme cleavage step. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents tfie pre-ampMcation step in which 
the sampled fragments are amplified. Step 3 rq)resents the probing enzyme cleavage 
stq). Stq) 4 rq)resents the PGR primer design. The small horizontal arrows denote the 
PGR primere flanking the ESPs. Step 5 represoits tiie PGR amplification step in which 
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only the sample fragments that are not cleaved by the probing enzyme arc amplified. 
The crossed circles rq)resent the fragments that are not amplified. 

Figure 5: Graphic representation of target fragments pixxluced by 
cleavage with a hexacutter (fiill arrows) and a tetracutter (dotted arrows) restriction 
enzyme. Two ^fpes of fragments are produced: type I fragmrats (dotted lines) carrying 
two tetracutter ends and type n fragmmts (full lines) carrying one hexacutter end 
(represrated by the arrowhead) and one tetracutter end. Upon PCR amplification only 
the type I Augments are amplified. 

Figure 6: EcoKI-Bfal fragments fiiom ecotypes Columbia (C) and 
Landsberg (L) obtained after selective amplification using EcoKI and Bfal AFLP 
primers with respectively 2 and 3 selective nucleotides. The fragment patterns were 
obtained ieq)ectively without probing enzyme (no enzyme) and after digestion with the 
Msel probing enzyme. It is noted that most of the larger fragments do not survive after 
Msel digestion, while the majority of the smaller fragments survive the treatment. The 
differences betwe^ the eco^pes Columbia (C) and Landsberg (L) observed after Msel 
digestion, marked by the arrows represent ESP carrying fragments. The differences 
found without Msel digestion, marked by the stars represent typical AFLP 
polymorphisms. 

Figure 7: False color hybridization patterns obtained on the Arabidopsis 
micro-arrays. The layout of the Arabidopsis micro-array is as follows: the left panel 
contains the ESP fragment probes derived from Columbia (upper half) and Landsberg 
(lower half), while the right panel contains the control monomoiphic probes with 
respectively the negative control fragments (-control) always carrying a probing 
endonuclease site and the positive control fragments (+control) carrying no probing 
endonuclease site. The upper part of the figure shows the hybridization patterns 
obtained with uncleaved sample DNA, while the lower part of the figure shows the 
hybridization patterns obtained with cleaved sample DNA. The false color code is as 
foUows: green rq)rBsents hybridization with the Cy3-labeled fragments, red represents 
hybridization with the Cy5-labeled fragments, yellow rq)resCTts hybridization with 
both the Cy3-labeled and the Cy5-labeled fragments, and gray r^resents no 
hybridization, this figure ofa set of idealized results is presented. The hybridization 
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patterns with the uncleaved sample DNA shows that all piobes detect sequences in both 
ecotypes, while the hybridization patterns with the cleaved sample DNA show that the 
ESP fragment probes detect only the sequences in the respective ecotypes from which 
the ESP fragments were isolated. In addition, fragments carrying no site for the 
5 probing enzyme, detect sequences in both ecotypes^ while fragments that always carry 
a site for the probing enzyme do not show a hybridization signal. 

Figure 8: False color hybridization patterns obtained on the com micro- 
arrays. The layout of the com micro-array is as follows: the left panel of probes 
contains random fragments derived from B73, while the right panel contains Mol7- 

10 fragments. The figure shows four hybridization patterns obtained with respectively 
uncleaved sample DNA, Msel-cleaved, Tsp509I-cleaved and Alul-cleaved cleaved 
sample DNA. The uncleaved sample DNA hybridization pattern shows probes that 
hybridize only to B73 (green), respectively Mol7 (red) fragments, which represent 
polymorphisms resulting from mutations in the sample enzyme recognition sites. The 

15 cross in the circle indicates that these probes are eliminated from the analysis. The 
cleaved sample DNA hybridization patterns show that the majority of the probes do not 
give a hybridization signal, indicating that their cognate fragmmts arc cleaved by the 
probing enzyme. Most of the probes giving a signal hybridize to both sample DNAs. 
Those that hybridize to only one of the sample DNAs and that were eliminated 

20 represent fragments carrying ESPs. The white arrows dmote the probes that were 
retained for further analysis. 

Detailed Descrintinn of the Tnvi^ntiofi 
The term "SNP" means Single Nucleotide Polymorphism, i.e. a 
25 polymorphism involving the mutation of a single base-pair. 

The term "ESP" means Endonuclease Site Polymoiphism, i.e. a 
polymorphism involving two alleles one of which is cleaved by an endonuclease 
reagent while the otiier exhibits (at least partial) resistance to cleavage by the same 
endonuclease under the same conditions. 
20 The phrase "(restriction) endonuclease reagent" refers to a reagent tiiat 

consists of one or more enzymes and that cleaves nucleic acids with a certain 
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Specificity, i.e. cleavage involves recognition of a particular sequence or set of 
sequences in the target DNA. Endonuclease reagents include but are not limited to the 
common type n restriction enzymes. 

The tenn "sampling endonuclease(s)" or "sampling enzymeCs)" refers to 
an endonuclease reagent used to derive sets of fragments from the sample DNA. 
The term '^probing endonucleaseCs)" or "probing enzyme(s)" refers to an endonuclease 
reagent used to probe the allelic state at specific ESP-sites. 

The term "polymoiphism" refers to the existence of two or more alleles 
at significant frequencies (^1%) in the population; polymorphism at a single 
chromosomal location constitutes a genetic marker. 

The term "micro-satellite (DNA)" refers to a small array (often less than 
0. 1 kb) of tandem repeats of a very simple sequence, often 1 to 4 base-pair. Variability 
at such a locus is the basis of many genetic markers. 

The term "mutation" means a heritable alteration in the DNA sequence. 

The term "allele" refers to one of several alternative sequence variants 
at a specific locus. 

The term "genotype" is commonly known to mean (i) the genetic 
constitution of an individual, or (ii) the types of allele found at a locus in an individual. 

The term "haplotype" refers to the gmotype at a series of linked loci on 
a single chromosome. 

The term "sample DNA" or "sample fragments" refers to the set of 
fragments or ampiicons derived from the starting DNA by the RAA method. 

The terra "zygosity" refers to the homozygous or heterozygous state. 

TTie term "homozygosity/homozygous" refers to the presence of identical 
alleles at a locus. 

The term "heterozygosity/heterozygous" refers to the presence of 
different alleles at a locus. 

The term "CpG" means a dinucleotide with a cytosine at the S*-side and 
a guanine at the 3 '-side. CpG is relatively rare in m amm alian DNA because of the 
tendency for the cytosine to be methylated and subsequently mutate to thymine by 
deamination. 
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The term ^'ecotype" refers to a naturally occurring (plant) variety; race. 
TTie term "bi-allelic" refers to a polymorphic locus characterized by two 
different alleles. 

The terms "microarray" and "(DNA-)chip" refer to a multitude of 
spatially addressable nucleic acids that serve as probes. The microarray may be used 
in the form of a planar solid support, a bead, a q>here, or a polyhedron. Fabrication 
is done either by in situ combinatorial synthesis of oligonucleotides using 
photolithography, or by robotic spotting of off-chip prepared DNA onto a solid 
surface. 

The methods of the present invention differs conceptually from 
previously described restriction enzyme-dq)Mdent assays {supra) that essentially detect 
a finagment length polymorphism. With the present method, starting DNA is restricted 
prior to the amplification reaction and, rather than analyzing the obtained amplification 
product, the presence or absence of amplification is measured to determine the allelic 
state at an ESP site. The treated DNA is preferably amplified by using a polymerase 
chain reaction and is preferably analyzed by means of hybridization against arrays of 
probe DNAs. With the present method, a sample-amplicon, and consequently a 
hybridization signal, is either present or virtually absent. This feature represents a 
major advantage in that it results in a more accurate distinction b^een variable 
nucleotides than is possible by differential hybridization to allele-specific 
oligonucleotides, and because it greatly facilitates the idratification of a set of generally 
useful hybridization conditions. Also, the methods of the invention permit the use of 
both oligonucleotides as well as DNA fragments as probe DNAs. While hybridization 
to arrays allows the simultaneous analysis of a laige number of ESPs, it should be clear 
that the amplification of sample DNA, treated witii probe restriction endonuclease 
reagent, can be analyzed by any of a variety of mefliods weU known in tiie ait. In these 
methods, an ESP is idratified dther by the presence of a recognition site for the probe 
restriction endonuclease leagMt (which will result in the failure of the sample DNA to 
amplify) or by the loss of a recognition site which will allow amplification of an 
otherwise unamplifiable sample DNA. Alternative m^ods include, but ai« not limited 
to, gel-electrophoretic analysis, and ttie TaqMan assay [Holland P. M. et aL, Proc. 
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Nml Acad. ScL 88: 7276-7280 (1991); with the latter assay detection is done during 
rather than after the amplification reaction]. 

One of the advantages of the method of the invention is the ability to 
calibrate the measured signal against that obtained in a control experiment wheie 
digestion with the probe restriction endonuclease reagent is omitted. Comparison of the 
respective hybridization signals, following various corrections and noxmalization 
procedures, is essential for tiie genotyping of ESPs and the accurate determination of 
the zygosity. The cleaved and uncleaved material can, ui princq)le, be hybridized 
separately but a preferred mediod consists of hybridizing a mixture of the differentially 
labeled samples to the same array. The present invention is exemplified by several 
specific formats described below. 

fP Format-I RAA: Choice of sampling and probing restriction 
endonuclease reagents. In one of its embodiments the present invention is directed 

to methods for detecting ESPs in a "restricted amplicon assay" (RAA) which comprises 

preparing concomitantly amplifiable restriction fragments from the starting DNA 

(sample DNA). When generating discrete sets of DNA fragments from genomic DNA, 

the following parameters are important: the average fiagment size and tiie total number 

of fragments. Hie optimal fragment size for use in the methods (and materials) of the 

present invention is a trade off; tiie fragments must be sufficiaitiy small for 

amplification witii roughly equal efficiency (in general ^500 base pairs) and large 

enough for having on average one cleavage site for the probing endonuclease reagent. 

In addition to average fragment size, the number of fragments determine the 

complexity of tfie sample DNA which is critical in view of tiie limitations of tiie 

d^ection sensitivity of micro-array hybridization. In general, the current state of tiie 

art of microanay hybridization is such that the number of sample fragments should not 

exceed 1(X),(XX). All of the above-mentioned requisites can be met by the appropriate 

choice of sampling and probing enzymes. A preferred method of tiie present invention 

to prepare sample DNAs (amplicons) involves the use of two different sampling 

enzymes, a rare cutter endonuclease (eg. , hexacutter) combined witii a frequent cutter 

endonuclease (e.^. , tetracutter), as described in EP 0 534 858 Al which describes a 

mediod called AFLP and which is incorporated herein by reference. As can be seen 
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from Figure 5, the rare cutter enzyme produces large ftagments that upon cleavage 
with the frequent cutter enzyme are cut into a number of smaller fragments. This dual 
cleavage generates two types of fragments: the majority having both ends produced by 
the frequent cutter (type I) and a minority of fragments having a rare cutter end and a 
frequent cutter end (type 11) . After ligating different adapters to each of the ends and 
using appropriate primers targeted to the ends of the fragments, only the type n 
fragments will be amplified efficiently (see Figure 5). The type I ftagments amplify 
with gieatly reduced efficiency presumably because the synthrtic sequences at the two 
ends constitute an inverted repeat. In general the type H fragments will amplify 
synchronously using a single PGR primer pair that attaches to the ends of the 
fragments. The size limit is typically around 500 base pairs, but can be increased by 
using a different DNA polymerase and other reaction conditions. Thus, as outlined 
above the number of ampUfiable fragments will be determined primarily by the choice 
of the rare cutter restriction enzyme. By approximation, this number equals two times 
the number of cleavage sites for the rare cutter. Li a prrfened embodiment, lestriction 
enzymes recognizing 6 nucleotides ^exacutters) or more are used as rare cutters. The 
use of a frequent cutter recognizing 4 nucleotides (tetracutter) as seccmd sampling 
enzyme results in the production of fragmmts in the optimal size lange for co- 
amplification. As probe restriction endonuclease reagents, diffei«it tetracutter or 
penlacutter enzymes can be used. The probe restriction endonuclease reagent and the 
fiequent cutter sampling enzyme should preferably be chosen such that the ratio of the 
cleavage frequencies of probing over sampling reagent is >0.5 and <3. This will 
ensure that a substantial fraction of the taiget fragments are cleaved once by the 
probing enzyme. It is noted that ESPs cannot be genotyped when the fragments are 
cleaved more than once by the probing enzyme. Also, it should be recognized that 
cleavage with the probe restriction endonuclease reagent results in a significant 
reduction (typically 2-4 fold) of the fragment complexity. 

Alternative schemes - diffecait from the one described above - that meet 
the requisites of sample complexity, average fragment size, and occunence frequency 
of the probe reagent and that will perform equally well, will be readUy apparent to one 
of ordinary skill in the art Alternative schemes may include the use of pains of 
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finequent cutters, followed by selective amplification (described in EP 0 534 858 Al), 
or the use of type nS restriction enzymes. Type nS restriction enzymes are 
characterized by an asymmetric recognition sequence. Most of these enzymes cleave 
at a defined distance to one side of the recognition site and generate single stranded 

5 overhangs that have different sequences. Ligation of adaptor sequences that are 
complementary to only one type of overhang allows the amplification of ^}ecific 
subsets of fragments [Kikuya Kato, Nucleic Acids Res. 23: 3685-3690 (1995)]. With 
this strategy the set of fragments obtained with the sampling enzymes can be broken 
up in a defined number of compten^itaiy and roughly equally complex subsets. Thus, 

10 with these enzymes it is possible to tune the complexity of the sample. The same 
strategy can be applied by making use of type n enzymes that have an interrupted 
palindromic recognition sequence. 

Type of mutations detected by fomiat-I RAA: In essence the method 

15 of the invention aims to detect mutations affecting the recognition sequences of the site- 
specific probe endonuclease reagents. When the probe enzyme cleaves a sample 
fragment, it is prevented from being amplified and as a consequence the fragment will 
not give a hybridization signal with its cognate probe. Mutations affecting the 
recognition sequence of the probe enzyme will allow amplification of the sample 

20 fragment and will restore the hybridization signal. It is recognized that mutations other 
than those affecting the probe enzyme recognition sites may affect the hybridization 
signals. In particular, mutations affectmg the recognition sites of the sampling 
enzymes may also lead to a loss of hybridization signal. Consequently, the mere 
detection of a hybridization difference between two samples does not qualify the 

25 difference as bemg due to an ESP for the probing enzyme. For this one must also 
assay the two samples without probing enzyme cleavage; only those differences that are 
correlated with the cleavage by the probing enzyme qualify as genuine ESPs as defined 
according to the present invention. Therefore, a preferred embodiment of the methods 
of thepresmt invration coiiq[>iise the comparison of the hybridization signals obtained 

30 with and without cleavage of the same starting material by the probe radonuclease 
reagent. Preferably, the digei^ and undigested sample DNAs are differentially 
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labeled such that equivalent amounts of the material can be mixed and hybridized 
against the same array of probes. It is noted that a further advantage of measuring the 
relative hybridization signals obtained with digested and undigested sample DNAs, is 
that the signal given by the undigested sample DNA serves as an internal control for 
correcting variations in amplification and hybridization. 

Identification and design of informative probes to detect ESP- 
harboring fragments. In a preferred embodimaat of the present invention sample 
DNAs (amplicons) are hybridized to micro-arrays comprising a set of probe DNAs 
which are designed such that each probe will hybridize specifically to one sample DNA 
fragment. For each set of sample DNA fragments a specific set of probes arc 
developed that will detect all the ESPs present in the set of sample DNAs. Since in 
most applications only a (minor) fraction of the sample DNAs will actually carry an 
ESP for a particular probing reagent, the set of probe DNAs will preferably consist of 
a subset of the sample DNA fragments that are informative in that they hybridize to 
ESP-harboring sample fragments. Preferably, the probes are highly specific for the 
ESP-canying sample ftagmrats, and do not ooss-hybridize with other fragments in the 
sample. This feature is verified by testing the candidate probes in control hybridization 
assays. When developing or designing the probes care should be taken to avoid 
hybridization of the labeled primer used to amplify the sample fragments. When the 
probes coirespond to a subset of the sample fragments, preferably an alternative set of 
adaptors should be used for their amplification. 

The sections below describe different approaches that may be used to 
assemble sets of unique probe DNAs for fabricating the micro-arrays. Three 
alternative approaches arc presented, and their choice is determined primarily by the 
degree of nucleotide sequence variation, and hence the ESP frequency, present in the 
species under study. 

(1) Direct screening. When the ESP frequency is high, such that 10% or more of the 
saniple fiagmrats cany ESPs, a realistic approach for assembling ESP probes is 
to array individual sample fragments and test which of them detect an ESP in the 
test mal^ under study . The advantage of this approach is that the same set of 
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fragments can be t^ted with different probe enzymes. After the screening one 
will retain only those probes that yield a clear-cut difference in hybridization 
between the different test DNAs. This approach is illustrated in ^cample 2. 

(2) Gel-based screwing. With genomic DNA exhibiting intermediate ESP frequencies 
5 (a few %), useful probes can be identified with a gel-based screening approach 

in which the ESPs are identified by comparing the patterns of sample fragments 
obtained ftom cleaved and uncleaved genomic DNA of various individuals. The 
polymoiphic fragments can thm be isolated from the gel and cloned or amplified. 
In a second phase, these probe-fragments are verified in a micro-array 
10 hybridization assay. This approach is illustrated in Example 1. 

(3) Batch-wise hybridization selection method. Since both approaches described above 

are inefficient and labor intensive when the ESP frequency is low ( < 1 %), it is 
advantageous to directly select or enrich ESP-carrying fragments. Such an 
approach is described in greater detail in Example 3. 
15 The methods of the invention can be used with any type of micro-array: 

spotted ESP-carrying fragments, spotted oligonucleotides or oligonucleotides 
synthesized on solid supports using photolithography [Fodor S. P. A. et al.. Science 
251: 767-773 (1991)]. Oligonucleotide probes can easily be designed based on the 
nucleotide sequences of the ESP-carrying fragments. Also, the methods of the 
20 invention are not limited to the use of planar arrays containing spatially addressable 
probes. A person of skill in the artwill recognize that the methods may alos employ 
a muldtude of identifiable solid phase particles (e.g. beads, spheres, and polyhedron), 
each carrying a different probe. Examples of such use are described by Fulton, R. 
[U.S. Patent No. 5,736,330] and Mandecki, W. [ U.S. Patent No. 5,736,332]. 

25 

nm Format-n RAA rU^neral mrtllnA 

The *format-I RAA' - as described above - can be converted to a 
'format-n assay' when sufficient sequence information of ESP-containing sample 
fragments becomes known. Format-II RAAs can also be designed on the basis of the 
30 known sequences of genomic regions that harbor an ESP and that are available through 
publicly accessible databases. Hie approach involves the targeted sampling of starting 
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mateiial and consists of the design of dedicated primer pairs that flank the ESP sites* 
Like in format-I RAA, if the site is intact, the starting DNA will be cleaved and no 
PGR product will be generated. Only when the site is mutated will the amplicon be 
generated. In practice, multiple ESP-containing genomic regions are co-amplified after 
cleavage with the probing restriction OTdonuclease reagent. The ultimate sample DNA 
used in the hybridization reaction is composed of several such multiplex PGR reactions 
pooled together. The feasibility of this approach is evidenced by the recent paper of 
Wang et al. Science 280: 1077-1082 (1998), incorporated herein by reference. The 
methods for format-II RAA described here are identical to the approach described by 
Wang et aL, in the way certam allelic regions are co-amplified, but fundamentally 
different in the way they are diagnosed. The present mrthod takes advantage of the 
clear distinction between having or not having an amplicon depending upon the allelic 
state of the endonuclease target site. The Wang et al approach in contrast relies on the 
detection of a hybridization difiference as a result of a single nucleotide variation in the 
PGR product. This requires a much more elaborate and redundant hybridization assay. 

Similar to format-I RAA, a preferred method consists of comparing the 
hybridization signals obtained with and wifliout cleavage with the probe restriction 
endonuclease reagent. Preferably, the respective amplification reactions are 
differentially labeled such that tiie resulting amplicons can be mixed and hybridized 
against the same array of probes. 

Preferred methods of the format-n RAA are fliose wherein - of each 
PGR primer pair - that primer that r^nained unlabeled is used as hybridization probe 
for the corresponding amplicon. This ensures that the excess unincorporated labeled 
primer as well as the primer extension products obtained with this primer cannot anneal 
to the arrayed probe. Also, the unlabeled PGR primer is complementary to the labeled 
strand of the amplicon. 

Furthermore, the format-n RAA method provides a means to monitor 
mutations in specific graes or loci in addition to scanning the entire genome. Indeed, 
s^ of PGRprimere that target ESPs in a specific gene or chromosome region can be 
assembled. 
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An RAA a??sav with positive detectinn of hnfh aiirf^; it is 
recognized that the 'presait/absKit-score' of the RAA assay cannot (always) distinguish 
between differwit mutations that can affect cleavage by the probe restriction 
endonuclease reagent. In practice, an ESP should not be assayed when available 
evidence indicates the existence of two or more such mutations at significant 
frequencies in the population. 

In a preferred embodiment the present invention is directed to the 
detection of SNPs that result in the simultaneous loss and gain of a restriction enzyme 
recognition site, Le. both aMes are associated with a differait recognition site. Hgal 
(GACGQ and SfaNI (GATGC) are an example of such rec^rocal sites. Use of both 
probmg endonuclease reagents in side-by-side experiments excludes alternative alleles 
and results in easy determination of the zygosity (refer to Example 4). 

Multi-allelic haplolyping;. A single ESP rq)resents a bi-allelic marker, 
which is less informative than a variable micro-satellite, which has multq)le alleles. 
It is possible however to compensate for the lower information content by idoitifying 
several ESPs on a specific chromosomal region. Fonnat-n RAA Irads itself readily 
to sudi an dpproadx and involves the design of a primer pair that oicompasses a i^on 
with a single site for the various selected probe oidonuclease reagaits. It should be 
recognized as one of the advantages of the present method that multiple ESPs on a 
sample amplicon can be interrogated with a single probe. Furthermore, use of the 
probing oizymes, dtho* sq»rately or in various combinations, in parallel experiments 
allows the construction of the haploQ^ for the ESPs under study. In general, the 
statistical associations between traits and specific chromosome regions may be more 
apparent when haplotypes rather than individual markers are used. 



nrnFormat-mWAA 

In a general sense, the fonmat-m RAA represents a method of choice for 
very high-density SNP genotyping because it provides a means to overcome the 
intrinsic limitations of both the format-I RAA and the foimat-n RAA. TWs is 
essentially achieved by performing a stepwise amplification involving a pre- 
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ampMcation of sample fragments followed by amplification using multiplexed specific 
primers. The principal advantage of the pre-amplification stqp is to reduce the 
complexity of the starting DNA, and thus to provide a more favorable starting point 
for performing multiplex PGR reactions. It is noted that this improvement is generally 
applicable to any multiplex PGR reaction, and is not limited to the methods of the 
present invention. Such an s^proach can also be used when for example SNPs are 
genotyped using the methods described by Wang et al. 

The principal limitation of the format-I RAA lies in the complexity of 
the sample DNA that is hybridized to the microanay. Because the second round of 
amplification in format-m yields only very small amplicons, which are all informative, 
there is no longer a limitation in number of sample fragments that are interrogated. In 
fact the entire genome may be sampled in a series of parallel pre-amplification 
reactions and the amplicons generated in the different multiplex PGR reaction can then 
be pooled together and hybridized to the microarray. 

likewise, the format-m RAA represents preferred methods of format-U 
RAA, especially when the ESPs under study are located on fragments generated by one 
set of sanq)ling endonuclease reagmts. Such stepwise amplification comprises the co- 
amplification of sample fragments with a single pair of primers, followed by the 
selective amplification of sets of specific ESP-containing regions (see Figure 5). The 
principal advantage of the format-m RAA over format-n RAA is that the initial 
amplification of the sampling fragments - representing only a fraction of the total 
genome - lowers the amount of starting material required to interrogate a very large 
numbers of ESPs. Also, the approach will facilitate the multiplex amplification of the 
ESP-specific amplicons and, consequently, yield a more robust assay. 

One prefmed embodiment of the format-m RAA is its use to genotype 
large numbers of ESPs identified through the use of the fonnat-I RAA, Indeed, 
f ormat-I RAA offers a rapid means to discover large numbers of ESPs in any biological 
species where no large body of sequCTce information is or wm be available. Format-I 
RAA enables one to discover many sets of ESPs for a number of different probing 
enzymes. Using the format-I RAA, each set of ESPs must be assayed on a different 
microarray, because otherwise signals for the same sample fragment will overlap with 
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one another, and thus preclude the proper ESP genotype to be determined. Using the 
fonnat-m RAA, the ESPs identified with different probing enzymes are now assayed 
together on one single microarray, without overlap between the different ESPs. The 
reason is that the overiap in the fonnat-I RAA is caused by the non-informative sample 
5 fragmrats that are always co-amplified with the ESP fragments. These are eliminated 
from die mixture by the specific PCR amplification. This embodiment is illustrated in 
Examples 2 and 3. 

Another preferred embodiment of the format-in RAA is its use to 
genotype laige numbers of SNPs identified in high-throughput sequencing of genomic 

10 DNA from differmt individuals from a given q^ecies. Given the generally recognized 
importance of SNPs for the development of high-resolution genotyping methods, 
sequraced SNPs can be expected to accumulate in large numbers in publicly available 
databases in the near future. In particular, in the field of human genetic analysis, SNPs 
will be discovered at a rapidly increasing rate through the massive genome sequencing 

15 programs now in progress. A similar evolution may be anticipated for many other 
species. Hence we decided to perform an in silico analysis of known human SNPs to 
fiuther investigate the potential of the invention. More particularly we have analyzed 
the 3 ,358 SNP sequmces present in the SNP database of the Whitehead Institute [Wang 
et a/., Sdence 280: 1077-1082 (1998)]. We have determined how many of these SNPs 

20 represent an ESP for each of 34 known palindromic and non-palindromic tetra- and 
p^ita-nucleotide restriction recognition sequences. When extrapolating this number to 
the total number of ESPs in the human genome - assuming a grand total of 3 million 
ESPs - it appears that the number of detectable ESPs per probing restriction enzyme 
is in the range of 25.000 to 150.000. A cumulative analysis reveals that 53% of the 

25 SNPs affect at least one of the 34 restriction sites; a total of 28 % affect the recognition 
site for one of the available tetracutter enzymes. The principal conclusion from this 
analysis is that many of the considered enzymes - used as probing enzymes according 
to the methods of the present invention - will interrogate sufficient SNPs to be able to 
built a high-dttisity map of the human genome. It should also be noted that the use of 

30 multiple probing mzymes is easily accommodated in the taigeted assay because the 
sample has to be subdivided anyway over a number of parallel multiplex PCR 
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reactions. This embodiment is illustrated in Example 4. 

It is noted that the format-m RAA may be performed according to 
different procedures. One such procedure is diagrammed in Figure 5, in which the 
test DNA is first sampled using a sampling endonuclease reagent, pre-amplified and 
5 then treated with the probing endonuclease reagent. Variations on this pioceduie are 
readily recognized by those skilled in the art and include for example, concomitant 
treatment of the test DNA with both the sampling and the probing endonuclease 
reagCTts and the preparation of sampled DNA fragments using aibitrary PGR priming 
mrfhods [Williams era/., A/iicto'c Adas 18: 6531-6535 (1990)]. Note that in case 
10 the treatment with the probing endonuclease reagent is performed prior to the pre- 
amplification, the subsequent amplification can be performed with any pair of PGR 
primers directed against the ESP carrying fragments, and thus overcoming the 
limitation of using PCR primers flanking the ESPs. 



15 
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Table I. Analysis of 3,358 SNPs in the Whitehead SNP database. The table lists the 
number of SNPs that represent an ESP for various probing enzymes. The last column 
shows the estimated number of ESPs for each enzyme in the entire human genome 
(refer to text for details). 



Type of probing reagent 


enzyme 


site 


ESPs 


total number 
of ESPs 


Tetracutter enzymes 


Tsp509I 


AA1T 


122 


109.000 




MaeU 


ACGT 


106 


95.000 




Alul 


AGCT 


98 


88.000 




NIain 


CATG 


158 


141.000 




Mspl 


CCGG 


77 


69.000 




BstUi 


CGCG 


27 


24.000 




oiai 


CTAG 


67 


60.000 




oauiA 


GATC 


58 


52.000 




HinPI 


GCGC 


49 


44.000 




Haeiu 


GGCC 


52 


46.000 




Cspol 


GTAC 


71 


63.000 




TaqI 


TCGA 


50 


45.000 




Msel 


TTAA 


109 


97.000 


Pentacutter enzymes 


Tsp4CI 


AGNCT 


114 


102.000 




BssKI 


CCNGG 


79 


71.000 




Ddel 


CTNAG 


122 


109.000 




Hinfl 


GANTC 


11 


69.000 




Fnu4HI 


GCNGC 


71 


63.000 




Sau96I 


GGNCC 


64 


57.000 




IVACIOIII 




/u 


fkK nnn 

OJ.UUU 


Non-palindromic enzymes 


Acfl 


CCGC 


111 


99.000 




MnU 


CCTC 


175 


156.000 




Bbvl 


GCAGC 


65 


58.000 




BsmAI 


GTCTC 


67 


60.000 




BsmFI 


GGGAC 


39 


35.000 




Fold 


GGATG 


66 


59.000 




Hgal 


GACGC 


31 


28.000 




PM 


GAGTC 


39 


35.000 




SfaNI 


GCATC 


51 


46.000 




Alwl 


GGATC 


37 


33.000 




BsrI 


ACTGG 


76 


68.000 




HpM 


GGTGA 


69 


62.000 




Mbon 


GAAGA 


85 


76.000 




TspRI 


CAGTG 


94 


84.000 
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The following illustrative examples were chosen to represent the 
spectrum of genomic complexities and the spectnim of degrees of genetic variation 
S which are susceptible to analysis using the methods of the present invention: 

Example 1 describes analysis of Arabidopsis (low genomic complexity, 
low gmetic variation). 

Example 2 describes genetic analysis of com (high genomic complexity, 
high genetic variation). 

10 Examples 3 and 4 describe genetic analysis in humans (high gnomic 

complexity, low genetic variation). 

Numbers given in the examples, and that relate to the occurrence 
frequency of certain restriction sites as well as the average size of the generated 
fragments are in part based on computer simulations using publicly available DNA 

IS sequences. 
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Example 1 

Genetic Analysis in Arahidniw{«; 

Li this example, a ftagmrait analysis-based approach is used to generate 
a set of genomic fragments carrying ESPs between the Aiabidopsis eco^es Landsbeig 
and Columbia, which are commonly used for genetic studies in the model oiganism. 
Arabidopsis is an example of a low complexity genome (size '120 Mb), and the two 
ecotypes exhibit a moderate level of gaietic variability. Previous studies have revealed 

that the average nucleotide sequence variation between the two ecotypes is in the Older 
1 polymorphism in 150 nucleotides. Consequently, the fraction of fragments expected 
to carry an ESP for tetranucleotide recogniang restriction enzymes is expected to be 
in the range of 2.5 % (1 :40). With such a low firequency, it is helpful to use a selection 
procedure to isolate the rare fragments containing ESPs. 

In essence the procedure described in this example comprises the 
following steps: 

4) Identification of a set of about 200 genomic fragments carrying 
Landsbeig/Columbia ESPs using a gd-electrophoretic appfoadi. 

5) Isolation and characterization of the ESP carrying DNA 
fragments (ESP firagmraits). 

6) Generation of micro-arrays with the ESP fragments 

7) ConHrmation of the ESPs by hybridization. 

Step 1. Identification of TOP frapn,^»ntfi 
Sampling enzymes. In the present example EcoRI, a restriction enzyme 
recognizing 6 nucleotides (hexacutter), in combination with Bfal, a restriction enzyme 
recognizing 4 nucleotides (tetracutter), are chosen as sampling enzymes. From the 
random frequency of occurrence of 6 nucleotide sequences (every 4,000 bases), the 
number of sites for hexacutter restriction raizymes in this genome is predicted to be in 
the range of 30,000. In addition to cleavage with a hexacutter, the genomic DNA is 
also cut with a tetracutter so as to generate PCR amplifiable fragments of an average 
size of a few hundred base pairs. Cleavage with the two enzymes gives rise to two 
types of ftagments: a majority of fragments resulting from cleavage by the trtracutter 
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^izyme alone and a smaller set of fiagments produced by the two enzymes (see Figure 
5). Since the majority of the hexacutter fragments will give rise to two fiagments 
having a hexacutter end and a tetracutter end (see Figure 5), this procedure will yield 
a mixture of about 60,000 fragments of this type. Upon amplification using the 
procedure described below only the fragments carrying a tetracutter end and a 
hexacutter end are amplified efficiently (Figure 5). 

Probing enzymes. As probing enzymes many different tetracutter 
enzymes can be used. Ideally, the probing enzyme cleaves most of the sample 
fragments once. Because plant DNA has a high AT content, the preferred tetiacutters 
are those that have an AT bias in their recognition sequence. In general, the choice of 
an optimal tetracutter may be determined by particular features of the genome being 
analyzed {e.g. , AT and GC content). In the present example, Msel (recognition site = 
TTAA) was chosen. Tsp509I (recognition site = AATT) is an alternative. It is also 
conceivable to use mixtures of two or more tetracutter enzymes. The EcoRI-BM 
sample/target fragments that are cleaved and not cleaved with the Msel probing enzyme 
are referred to as cleaved and uncleaved sample/target DNA, respectively. 

Screening for ESP carrymgjmgmems. To detect ESP fiagments, subs^ 
of uncleaved and cleaved EcoRI-Bfal sample fragments from both eco^s axe 
amplified and the amplicons are compared following gel-electrophoxetic fractionation. 
Subsets of the EooRI-Bfal sample fiagments are selectively amplified as described 
[Vos, P. etal., Nucleic Acids Res. 23: 4407-4414 (1995); Zabeau, M. and Vos, P., 
European Patent Application EP 0534858 (1993) both of which are incorporated herein 
by reference]. Given the complexity of the sample ("50,000 fragments), the selective 
amplifications are performed with EcoRI and Bfal primers having two and three 
selective nucleotides, respectively. This equals 1024 (16 x 64) different selective 
amplification reactions. 

The experimental procedure described by Vos P. et al. is followed 
except that the template fragments are incubated at 65®C during 10 minutes to heat- 
inactivate the T4 Hgase enzyme, and, when applicable, digested with the probing 
enzyme prior to amplification. The structures of the EcoRI and Bfal adaqjtors are as 
foUows [see, e.g., Vos, P. etal., supm]: 



wo 00/28081 



-27- 



PCT/1B99/01958 



5» -CTCGTAGACT6CGTACC (SEQ ID NO: 1) 

CATCT6ACGCATGGTTAA-5' (SEQ ID NO: 2) 

5 5* -GACX3ATGAGTCCTGAG (SEQ ID NO: 3) 

TACTCAGGACTCAT-5' (SEQ ID NO: 4) 

The EcoRI (radiolabeled by S'-phosphoiylation) and BM primers, 
having two and three selective nucleotides, respectively, have the following sequraces 
10 (where N represents A, C, G, or T): 

5' -GACTGCGTACCAATTCNN (SEQ ID NO: 5) 
5' -GATGA6TCCTGAGTAGNNN (SEQ ID NO: 6) 

15 

Using these reagents, most of the obtainable target fragments contain a 
cleavage site for the probing enzyme and, consequCTtly , will not be amplified when the 
target DNA is cleaved. Most of the fragments that survive the treatment with the 
probing enzyme occur in both ecotypes, and thus cany no ESP. Occasionally fragments 

20 are found that appear in both ecotypes when the target DNA is not digested and that 
are presmt in only one of the two ecotypes after digestion. These represent true ESPs 
for the probing enzyme. In addition, fragments will also be found that show typical 
AFLP-polymoiphism between the two ecotypes [Vos, P, et al , Nucleic Adds Res. 23: 
4407-4414 (1995)]. Such polymorphisms are apparent in the fragment patterns 

25 obtainable with the undigested sample DNAs. A typical result is shown in Figure 6 in 
which the electrophoretic patterns are shown of selectively amplified EcoBI-Bfal 
fragments from the Ecotypes Columbia and Landsberg obtained without and with 
digestion with the Msel probing enzyme. 

Systematic comparison of the patterns of ecotypes Columbia and 

30 Landsberg before and after digestion, allows the identification of EcoRI-Bfal sample 
amplicons that carry an ESP for the probing enzyme. Using Msel as probing enzyme, 
it is estimated that a total of "200 polymoiphic fragments which are present in only one 
of the ecotypes can be identified. 
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Steo 2. Isolation and characteriT atinn of R<;P frag ments 
Each of the ESP polymoiphic fragments is eluted from the gel-matrix, 
re-amplified and cloned into a suitable plasmid vector (e.g. TA cloning system; 
InvitiDgen, Carlsbad, CA, U.S.A.). In each case, two clones are selected for sequence 
determination. Most duplicate clones will yield the same sequence. Duplicate clones 
that gave different sequences were not retained for further work. Since the nucleotide 
sequence of over one third of the Arabidqpsis genome is available in the public 
databases ie.g. , Genbank), the chromosomal location of one third of the ESP fragments 
can be determined by matching the fragment sequences to the genomic sequence. 
FurthermoiB since the genomic sequence is derived from ecotype Columbia, we expect 
a perfect match with the fragment sequences isolated from the same ecotype. The 
sequences of the fragments isolated from ecotype Landsberg will reveal single 
nucleotide differences, amongst which the potential restriction site mutations, affecting 
the Msel recognition sites, should be apparent. 

In addition to the ESP polymorphic fragments, a number of non- 
polymorphic control fragments are processed in the same way. Two types of such 
control monomoiphic fragments are isolated: fragments that do not carry a site for the 
probing enzyme and fragments that carry a site for the probing enzyme in both 
ecotypes. These fiagmmts will serve the purpose of verifying the hybridization on the 
micro-arrays. 

Step 3. Fahrication of RSP micm-^arrays 
Micro-arrays of amplified fragments. The insert DNAs from the 
sequence verified clones are amplified, e.g. with the use of non-selective EcoRI and 
Bfal primers. PCR products are verified by agarose gel electrophoresis and retained if 
a single product of the correct mobility was present. Following ethanol precipitation, 
the resu^nded PCR products are arrayed at high density on standard glass slides (25 
X 76 mm) using either the MuWgrid robotic spotter (GeneMachines™, Genomic 
Instrumentation Services Inc., Menlo Park, CA, U.S.A.) or the BioChip Arrayer™ 
(Packard Instrument Company, Meriden, CT, U.S.A.). The DNAs arc spotted in a 
logical order wifli Kspea to the ecotype fiom which the fragments were isolated (upper 
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and lower pand) as shown in Figure 7. to addition, a set of DNAs from monomoipliic 
control fiagmwits was spotted next to the ESP fragment DNAs (right panel in Figure 
7). 

Micro-arrays of oligonucleotides. Based on the nucleotide sequences of 
the ESP fragmMits, oligonucleotides can be designed that can serve as hybridization 
probes to specifically detect each anq)liiied sample ftagmoit. Hie oligonucleotide pr<^ 
should preferably mateh with a sequoice that is located to one side of the ESP, 
oppoidte the side where the sequence targeted by the labeled primer is located. In this 
way the bacJ^ground is minimized because the linear amplification products generated 
by the labeled primer following digestion with the probing enzyme are not detected. 
The ESP fragmrat specific oligonucleotides are spotted in a micro-array format in 
exactly the same way as the amplified ESP fragments. 

Step 4. Micm-array-hasftH Hft tecrinn nf RSPs 
Preparation of the sample DNAs. For each eco^pe, sanople DNA is 
prqnred in two different ways. Genomic DNA, digested with the igaiwpling restriction 
enzymes EcoRI and Bfal, was amplified either as such or after cleavage with the 
probiqg aizyme Msel. The amplification reactions are poformed with a fluorescoitly 
labeled EcoRI primer and an unlabeled BEal primer, b(^ without selective nucleotides. 
The EcoRI primer is labeled by incorporation of Cy3(grBen)- and CyS(red)-amiditBs 
duiiiig primer synthesis (Amersham Pharmacia Biotech, Uppsala, Sweden). For both 
Columbia and Landsbeig, the cleaved sample was amplified with a Cy3-piimer while 
the uncleaved fiagments were amplified with a Cy5-labeled EcoRI primer. In addition, 
the Landsberg digested material was also amplified with a Cy5-labeled EcoBI PGR 
primer. Three different hybridization solutions are then prepared by mixing equal 
amounts (i.e. equal volumes) of the Cy3- and CyS-labeled amplification reactions: one 
from the Columbia cleaved and uncleaved samples, a second ftom the Landsberg 
cleaved and uncleaved samples, and a third by mixing the differentially labeled cleaved 
samples of both ecotypes. 

to case arrays of PCR products, rather than oligonucleotides, are used 
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as probes (refer to step 3), the cx)-amplification of the EcoRI-Bfal sample fragments is 
preferably accomplished with a pair of adaptors that differs from those attached to the 
arrayed probes. The alternative EcoRI and Bfal adaptors have the following structure: 

5* -GAGCATCTGACGCATCC (SEQ ID NO: 26) 

GTAGACTGCGTAGGTTAA- 5 • (SEQ ID NO: 27) 

5' -CTGCTACTCAGGACTG (SEQ ID NO: 13) 

ATCAGTCCTGACAT-5' (SEQ ID NO: 14) 
The cognate non-selective EcoRI and Bfal primers have the following 

sequences: 

5' -CTGACGCATCCAATTC (SEQ ID NO: 28) 
5^ -CTACTCAGGACTGTAG (SEQ ID NO: 16) 

Micrthormy hybridization. Each of the hybridization solutions is allowed 
to hybridize to the arrayed probes using protocols well known in the art. The 
experimental conditions dq)end primarily on the nature of the probes, PCR-amplified 
fragments versus oligonucleotides. Both types of experiments are amply described in 
literature: Wodicka, L. etal. Nature BiotechnoL IS: 1359-1367 (1997); Lockhart, D. 
J. etal., Nature BiotechnoL 14: 1675-1680 (1996); DeRisi, J. L. etaL, Science 278: 
680-686 (1997); Shalon, D. etal.. Genome Res. 6: 639-645 (1996); Pi6tu, etal. 
Genome Res. 6: 492-503 (1196); Chee, M. etal.. Science 274: 610-614 (1996); Wang 
D.G. etal., Sdencem: 1077-1082 (1998); WinzelerE. A. etal.. SdenceMl: 1194- 
1197 (1998), all of which are incorporated herein by reference. 

A laser scanning system (ScanArray 3000; General Scanning Inc., 
Watertown, MA, U.S.A.) is used to detect the two-color fluorescence hybridization 
signals from the micro-arrays at a resolution of 10 micron per pixel, A sq)arate scan 
is carried out for each of the two fluorophores used. Scanning parameters and laser 
power settings are adjusted to normalize the signal in the two channels (channeH/Cy3; 
channel-2/Cy5). The obtained digital images were analyzed using the ImaGene™ image 
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analysis software (BioDiscovery Inc., Los Angeles, CA, U.S.A.). The extracted 
quantitative data are transferred to a spreadsheet for further analysis. 

The present hybridization experiment is essentially set up as a 
confirmation of the gel-electrophoretic data (refer to step 1), and has, therefore, a 
5 predictable outcome. In addition, a number of control probes are included on the 
biochip that detect monomoiphic EcoRI-Bfal Aiabidopsis fragments (i.e., fragments 
on which a site for the probing ^izyme is either present or absent in both eco^pes). 
The results firom these control probes allow correction for background and optical 
cross-talk between the two channels, as well as calibration of the red and gieen 

10 hybridization signals. It is anticipated that the vast majority of the processed data are 
unambiguous with respect to the allelic state of a sample fragment and in agreement 
with the gel-^lectrophoretic analysis. Figure 7 shows a false-color representation of the 
idealized results of the present experiment using a fictitious array of probes. It cannot 
be excluded that certain hybridization results are not in agreement with the gel- 

15 electrophoretic assay and/or that certain probes do not allow unambiguous 
deteraiination of the allelic state of the cognate sample fragment. Such probes should 
be exchided from the micro-arrays that are used to genotype e}q)erimental Aiabidopsis 
samples, other than the Columbia and Landsbeig controls used in the presrat 
illustrative example. 

20 In routine genotyping experiments, either one of the hybridization 

schmies outlined above can be used. Determination of the allelic state can be done by 
comparing the hybridization signals obtained with and without cleavage of the starting 
DNA with the probe reagent. Alternatively, allele-calling could be based on a 
comparison of the signals obtained with the test-sample and an appropriate control (e.g. 

25 Columbia or Landsberg DNA), both cleaved with the probe endonuclease reagent. The 
samples that need to be compared can, in principle, be hybridized separately but a 
preferred mediod consists of hybridizing a mixture of differratially labeled samples to 
the same array. 



30 
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Examplel 
Genetic Analysis in Ci%m 

In this example, the utility of the method of the invention for marter 
assisted selection applications in plant and animal breeding is illustrated. Com has been 
chosen because it is a typical rq>resentative of crop species having a complex genome. 
The large size of the genome (2,400 Mb), the frequent occurrence of repetitive DNA 
sequences and the high degree of genetic variation, all constitute technical challenges. 
In this example, an approach based on the generation of a set of genomic fragments 
carrying ESPs fiom two well-known inbred lines of com, B73 and Mol7 from which 
many of the com elite lines are derived is used. Another reason for choosmg these 
lines is that a well-studied recombinant inbred population derived from these lines is 
available. This population can be used to map the set of ESPs. The genetic map of ESP 
markers will prove to be an effective tool for genetic selection in com breeding. It is 
evidrat, however, that a broader survey of the com germplasm with a total of 10 to 20 
lines will give a large number of additional ESPs (possibly 2 or 3 times as many) and 
will eventually result in a higher-resolution genetic map. 

The ESP-harboring fragments could very well be identified by the gel- 
electrophoretic ^roach described for Arabidopsis (Example 1). However, an 
alternative strategy may be used given that tfie com germplasm, like many crop 
species, exhibits a high degree of genetic variation. Indeed, based on previous studies, 
the average nucleotide sequence variation in the com germplasm is estimated to be in 
the order of 1 difference in 15 to 30 nucleotides. This corresponds to a frequency in 
ESPs in the recognition sites of tetracutter restriction enzymes of 1 in 4. At this 
frequency it becomes feasible to directiy examine arrays of random B73/Mol7- 
fragments for the presence of ESPs using the present RAA metiiod without prior 
screening or selection. The strategy also lends itself readily to screening with several 
different probiag enzymes. 

In the presoit example, two different approaches for assaying ESPs are 
used. The first method (format-I RAA) is similar to tiie one described in Example 1, 
and detects ESPs in fragments sampled with a pair of restriction enzymes. In the 
second method (format-m RAA) individual ESPs are selectively amplified from tiie 
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sampled fragments with dedicated primer sets. The principal advantage of the latter 
approach is that ESPs detected with several different probing enzymes can be assayed 
simultaneously, and that multiplex amplification of ESP-specific PGR products is made 
considerably more robust. 

In essence the procedure described in this example comprises the 
following steps: 

8. Identification of a set of candidate ESP fragments ftom the 
inbred Unes B73 and MoI7 

9. Development of a com ESP micro-array 

10. Genetic m^ing of a B73/Mol7 recombinant inbied population 
and of segregating populations 

Steo L Identification of nandt date ESP fragmftntg 
Cloning of a set of sample fragments. To clone a set of random 
ftagments ftom the inbred lines B73 and Mol7, the enzyme combination Psti and Bfal 
is used. The hexanucleotide-recognizing enzyme PstI was chosen because of the large 
size of the com graome. It is estimated that this enzyme has around 30,000 sites in the 
com genome. The second tetracutter-enzyme, Bfel, is expected to cleave in the 
majority of the cases on both sides of the PstI sites. The double digestion will therefore 
generate about 60,000 sample fragments witii an average size of 400-500 base pairs. 

Followmg double digestion of the genomic DNA, PstI- and Bfal- 
adaptors were ligated to the fiagment ends and flie material amplified with non- 
selective PstI and Bfel primers. The stractures of tfie Psfl- and Bf al-adaptors are based 
on ttiose described by Vos P. et al. Nucleic Acids Res. 23: 4407-4414 (1995): 

5* -CTCGTAGACTGCGTACATGCA (SEQ ID NO: 7) 
3' -CATCTGACGCATGT (SEQ ID NO: 8) 

5 • -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3» -TACTCAGGACTCAT (SEQ ID NO: 4) 



The corresponding Pstt and Bfal non-selective primers have tfie following sequences: 
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5» -GACTGC6TACATGCAG (SBQ ID NO: 9) 
5' -GATGAGTCCTGAGTA6 (SEQ ID NO: 10) 

5 

The amplification stq) enriches the Pstl-B£al fragments over the laige 
excess of Bfal-Bfal fragments. After amplification the fragments are fractionated on 
an agarose gel to eliminate the fragments smaller than 100 base pair, and cloned in an 
appropriate vector (e.g. TA cloning system; Invitrogen, Carlsbad, CA, U.S.A.). 

10 Preparation of spotted micro-arrays with the cloned sample DNA 

fragments. The insert DNAs, from the two libraries of cloned Psd-Bfal sample 
fragments (obtained from the B73 and Mol7 inbied lines), are amplified ftom the 
clones using the non-selective PstI and Bfal primers. Following purification and 
concentration, the amplicons are arrayed as described in Example 1. A total of 20,000 

15 (i.e. 10,0(K) from each library) candidate probe DNAs are spotted. 

Micro-array f^bridization and selection of candidate ESP-fragments. 
From genomic DNA of the inbred lines B73 and Mol7 four differrat sets of Pstl/Bfal- 
digested amplified DNA are prepared. An alternative pair of adaptors and non-selective 
amplification primers are used for this: 

20 

5' -GAGCATCT6ACGCATGTTGCA (SEQ ID NO: 11) 
3 ' -GTAGACTGCGTACA (SEQ ID NO: 12) 

5' -CTGCTACTCAGGACTG (SEQ ID NO: 13) 
25 3 • -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5» -CTGACGCATGTTGCAG (SEQ ID NO: 15) 

5' -CTACTCAGGACTGTAG (SEQ ID NO: 16) 

30 

The sample fragments are amplified either as such or after digestion with 
one of three alternative probing enzymes, Msel, Tsp509I and Alul. As probing 
enzymes many different tetracutter or pentacutter enzymes can be used. Because plant 
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DNA has a high AT content, the preferred enzymes are those that have an AT bias in 
their recognition sequence. Alternatively, mixtures of two or more tetracutter or 
pentacutter enzymes can be used. 

For each of the B73 samples, a Cy3(green)-labeled PstI primer is used, 

S whereas the Mol7-derived fiagments are amplified with a CyS(red)-labeled PstI primer 
(refer to Example 1). Different hybridization solutions are then prq)ared by mixing 
equal amounts of the uncleaved, Msel-cleaved, TspS09I-cleaved, and Alul-cleaved 
samples of both inbied lines. Each of the 4 mixes is allowed to hybridize to the micro- 
arrays. Analysis of the scanned images involved normalization using the multitude of 

10 probes on the arrays that detect monomoiphic fragments. Figure 8 shows a false-color 
rq>resentation of the idealized results of the present experiment using a fictitious array 
of probes. 

Analysis reveals that candidate ESP fragments are readily identified by 
scoring the probes that hybridize with only one of the two inbred line sample DNAs 

IS after cleavage with the probe enzyme (Figure 8). The quantitative analysis allows us 
the use of an unambiguous cut-off threshold of 10-fold difference in the normalized 
signal intensities for scoring ESPs. It should be pointed out that the assay identifies 
both bona fide ESPs and polymorphisms in the sampling enzyme sites. Most of the 
latter polymorphisms result in a mari^ hybridization diff(^»nce with the sample DNAs 

20 not cleaved with the probe enzyme (see Figure 8). Analysis of 180 probes reveals that 
roughly 6% of the sample fragments carry ESPs for Msel, Tsp509I, or Alul, in 
accordance with the expected ESP mutation frequency. The analysis of 20,000 cloned 
probe fragments is thus expected to yield a total of 1 ,200 fragments carrying ESPs for 
the three probe enzymes tested. By using additional tetracutter and pentacutter 

25 enzymes (see Table I) , the fraction of ESP carrying fragments may be as high as 25 % , 
amotmting to 5,000 ESPs. 

Of all probes that exhibit a differential hybridization with the cleaved 
sample DNAs, only those in which the recognition site for the probing enzyihe is 
present were retained for development of a com micro-array. Sequence determination 

30 of these probe-fragments reveals the position of the recognition site for the probe 
enzyme. Thus, we retained only those probes that failed to give a signal with the 
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cleaved sample DNA from the same inbied line from which they were isolated. Such 
probes exhibit the hybridization pattern shown in the Table here below and are marked 
with an anow in Figure 8. 

B73/Mol7 (Cy3/Cy5) nonnalized hybridization signal 
Undigested MseI/Tsp509I/AluI-digested 
B73-probes "1 < 0.1 

Mol7-piobes '1 > 10 



Step 2. Development of a com HSP minm-array 
Sequencing of the candidate ESPs and design ofnuaker specific primers. 
Clones corresponding to the probes that yield the desired hybridization pattern (Figure 
8) are sequenced. The majority of the insert DNAs derived from these clones contain 
a single recognition site for the probing enzyme. For each unique candidate ESP, two 
specific PGR primers, flanking the restriction site, are designed. 

In addition, the sequence of a limited set of probes that yielded invariant 
hybridization signals is also detennined. PGR primers targeting these monomoiphic 
sequences are included as references; they are used to calibrate the hybridization 
signals. 

Validation of the candidate ESPs and fabrication of com micro-arrays. 
The candidate ESPs, identified under step 1, are subjected to a confinnatory 
e;q)eiiment using the format-m approach. First, four pre-amplification reactions are 
perfoimed with a single primer pair and using the Pstl-Bfal fragments, undigested or 
digested with either one of the three probing enzymes, as template material. These 
amplification reactions reduce the complexity of the DNA under study by more than 
two orders of magnitude while at the same time generating a large enough amount of 
material for the subsequent multiplex maricer-specific PCRs. The pre-amplifications 
are then used for the PGR rescue of each of the characterized candidate ESPs using 
dedicated primer couples [refer to Wang, D. G.etal., Science 280:1077-1082 (1998)]. 
Particular sets of the ESP-specific primers that amplify the same type of ESP (i.e. 
ESPs for one particular probing enzyme) are combined in a single reaction, together 
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with the appropriate pre-amplification material as template. One of the ESP-specific 
primers is either Cy3- or Cy5-labeled; the other remained unlabeled. The Cy3-primers 
are used for the mult5)lex amplification of the DNA that had previously been digested 
with a probing enzyme, whereas the Cy5-primers are used with undigested control 
S DNA. The PGR products from the various multiplex reactions performed on both 
digested and undigested DNA were pooled together to obtain a single hybridization 
mixture per starting DNA. The B73 and Mol7 derived material was analyzed in 
parallel e3cperiments. The set of ESP-specific unlabeled PGR primers served as 
hybridization probes and was arrayed in the same way as amplification products. 

10 Conditions used are similar to those previously described for hybridization against 
oligonucleotide probes and are readily determined by one of ordinary skill in the art. 

Direct comparison of the normalized Cy3 and Cy5 hybridization signals 
allows determination of the allelic state of the endonuclease target site in B73 versus 
Mol7. Primer pairs that do not allow unambiguous allele calling or that do not 

IS confirm the candidate ESPs idmtified with Pstl-Bfal sampling (refer to step 1), are not 
retained for further work. 

Step 3. Genetic analysis of a B73/Mq17 recombinant inbred pqmlatinn and of 

segregating populations 

20 Genetic analysis of a B73/Mol7 inbred population. A collection of 

recombinant inbred lines derived from a cross between B73 and Mol7 is publicly 
available and provides a most useful set of lines for verifying and mapping the 
collection of ESP markers. The advantage of recombinant inbred lines over segregating 
populations is that each inbred line contains a different set of homozygous chromosome 

25 segments derived from either parent line. Consequently each ESP will be scored as 
either present or absent. Prq)aration of the sample DNAs and hybridization against the 
arrayed probes are performed as described under stsp 2. The experiment will, in the 
first place, allow the testing of selected ESPs in over 100 measuremrats; the results 
will result in the development of a second gmeration system that will only detect the 

30 most consistent ESPs. In addition, the linkage analysis of the segregation data will 
allow the construction of a fine genetic map of the markers. Finally, based on the 
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mapping data, an ordered ESP micro-array is developed for com. 

Genetic analysis of segregating populations. While isolated from two 
inbred lines, it is anticipated that the above-mentioned ordered ESP micro-arrays will 
detect sufficient genetic polymorphism in other com lines to be useful for marter 
assisted selection. To demonstrate the applicability, one could either chose a 
segregating F2 population or a back-cross population. Sample preparations and 
hybridizations are again performed as described under step 2. In this e^riment, the 
ESP maiters must be scored quantitatively so as to differaitiate between heterozygosity 
and homozygosity. Because only the most consistent markers are retained, a two-fold 
difference in signal intensity is easily monitored. The approach used consists of 
normalizing the hybridization signal intensities and then applymg a mixture model 
analysis on the normalized data. This statistical approach consists of determining 
whether the relative signal intensities can be grouped into three discrete classes, 
corresponding to respectively homozygous present, heterozygous and homozygous 
absent. ESP markers that do not fliliUl this criterion should be eliminated from the 
analysis. 

Example 3 

Human Genetic Analysis Using the Format-I RAA 
This example illustrates the application of the method of the invention 
for genome-wide genetic analysis in humans. Human is an example of a high 
complexity genome (size "^3,000 Mb) combined with a very low level of genetic 
variability. Single nucleotide differences between pairs of allelic sequences from 
different individuals occur approximately once in every 1000 basepairs; in the 
population at large, the frequency may be in the order of 1:300. As with Arabidopsis, 
such a low frequency necessitates the use of a selection procedure for the 
isolation/enrichment of the rare ESP-harboring fragments. In this example a batch-wise 
hybridization is used to accomplish this. 

Based on the known mutation frequencies, it can be estimated that the 
ESP finequency for a trtracutter-probing enzyme is in the order of 1 in 125 recognition 
sites. This low level of genetic variation, in combination with the sensitivity of nucro- 
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array hybridization, limits the number of ESPs that can be detected in a single assay 
(typically ranging from a few hundred to one thousand, a few thousand at the most). 
These limitations can, to a certain extent, be overcome by choosing probing enzymes 
that recognize tetranucleotide sites containing a CpG dinucleotide. Indeed, it is well 
documented that a substantial fraction (> 23%) of the nucleotide substitutions in the 
human genome result from C T transitions in CpG dinucleotides. Such CpG 
dinucleotides rq)resent mutational hotspots in veitebrates because a large fraction of 
the cytosines are methylated and subsequentiy mutate to thymine by deamination. It is 
estimated that the mutation frequency of methylated cytosines is 6 to 8-fold higher than 
average. Hence probing enzymes that cleave CpG-containing recognition sites will 
yield ESPs at correspondingly higher frequencies, estimated at "5%. However, the 
adverse consequence of the high mutation rate is that CpG is relatively rare in 
mammalian DNA, occurring with a frequency of 1 in 100 nucleotides [Wang, D. G. 
et al. Science 280:1077-1082 (1998)] instead of 1 in 16. Likewise the frequency of 
CpG-containing tetranucleotide sites is 1 in ''IdOO instead of 1 in 256 bases. To 
compensate for this, a probe endonuclease reagent can be used, comprising of two or 
more of the following complementary restriction enzymes: TaqI (TCGA), Mspl 
(CCGG), Maen (ACGT), and HinH or Hhal (GCGQ. It should be noted however tiiat 
cleavage by Maell as well as the isoschizomers HinPI and Hhal is blocked by 
methylation of the cytosine residue (C^) witiiin the CpG dinucleotide. These enzymes 
will thus only cleave at a fraction of their sites, namely the non-methylated sites. 
Analysis of the large amount of publicly accessible human genomic DNA sequence 
shows that the cocktail of the 4 enzymes will cleave once in every 400 bp on average. 
The total number of sites m the genome is thus in the order of 7.5 million. Assuming 
that the ESP frequency is 5%, the enzyme cocktail has the potential of detecting 
'375,000 ESPs. In addition to using combinations of restriction endonucleases, one 
may also use reaction conditions that decrease the cleavage specificity. Such a strategy 
has bem sailed to obtain a restriction endonuclease reagent, designated COasel, that 
is capable of cleaving DNA at CpG dinucleotides [Mead D. et al., WO 94/21663]. 
This CGasd restriction radonuclease reagent may be particularly useful for the analysis 
of human polymorphisms using the methods of the present invention. 
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Hie example described below illustrates the approach in a limited scale 
assay, which characterizes the human ESPs within CpG-containing tetranucleotide 
recognition sites using the sampling enzyme combination PacI - Bfal. The rare cutter 
Pad is estimated to have only about 50,000 cleavage sites in the human genome; the 
frequent cutter BM will generate two fragments per Pad site. The enzyme combination 
will, therefore, create a moderately complex srt of 100,000 Pacl-Blal target fiagm^. 
This fragment set captures a sizable nimiber of CpG-containing restriction sites, 
estimated in the order of 40,000. Assuming a 5% ESP frequency, the number of 
detectable ESPs is in the order of 2000. It should be stressed that many diifeient 
sampling enzyme combinations can be used and that thus a substantial fraction of the 
*^375,000 ESPs located within NCGN-type restriction sites can be monitored. 

The procedure outlined in this example comprises the following steps: 

(1) Development of a set of candidate Pacl-Bfal ESP fragments 

(2) Genetic analysis of humans using ESP probe fragments 

Step 1 . Development of a set nf PacT-RfaT pmhe fragment 
A mixture of sample fragments, derived from various individuals in the 
population, can be divided in three classes wilh respect to sites for the probing enzyme: 
monomoiphic fragmrats that are devoid of a cleavage site, fragments that are always 
cleaved, and fragmeais that cany one polymoiphic recognition site. Fragments that are 
digested will be referred to as S+ fragments and fragments laddng the site as S- 
fiagmmts. Polymorphic ESP ftagments will thus be the only fragments present in both 
the S+ and S- population of sampling fragments. This forms the basis for their 
selection by batch-wise hybridization: only ESP fragments are capable of annealing 
when mixing the S+ and S- fragment collections. The hybridization-selection can be 
performed in two different, reciprocal ways: either the S+ fragments can be used to 
retrieve the matching S- fragments, or S- fragments arc used to collect the 
complemmtaiy S+ sampling fragments. In one approach, the selected candidate ESP 
fitagments may be isolated by cloning, arrayed, and subsequently validated by testing 
various sample DNAs (e.g. the various sample DNAs used as starting material for the 
hybridization-selection). Candidate ESP probe firagmrats that appear to detect 
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nMHKmoiphic sample fragments may either be removed from the array or retained as 
control danaits on the array. An alternative approach consists of performing the two 
reciprocal hybridization-selections, cloning the selected fragments, and identification 
of ESPs by means of matching S+ and S- fragmwits. The latter strategy is outlined 
below. 

(i) PrepanuionofS+ andS-Jmgments The preferred starting 
material is an equimolar mixture of genomic DNA from a number of representative 
individuals. Such individuals (ranging from 5 to 50) may be chosen ftom various 
CEPH (CcatK d'Btude du Polymoiphisme Humain) pedigrees [Wang, D. G. et al.. 
Science 280:1077-1082 (1998)]. FoUowmg cleavage of the DNA mixture with the 
Fad/Bfal-combination of sampling enzymes, appropriate oligonucleotide adjq)ters as 
described above are hgated to the fragmoit ends. This template DNA is divided in two 
aliquots and treated sqwiately to prepare respectively the S+ and S- fragment mix. To 
prepare the S- fragment mix, the target DNA fragments are cleaved with the probing 
enzyme and then amplified. This will result in a mixture of fragments that do not 
contain sites for the probing enzyme. Furthermore, the S- fragment mixture may be 
psqared by using one biotinylated primer, such that the resulting PGR product can be 
c^red onto a solid substrate, such as magnetic beads conjugated with streptavidin. 
S+ fiagments are prqiared by (1) amplifying the mfacture of PacI-BM fiagmaits, (2) 
digesting the PGR product with one of the four NCGN-recognizing enzymes, (3) 
ligating appropriate aditpters to the ends generated by the probing enzyme (see BP 0 
534 858, incorporated hereui by reference), and (4) re-amplification of the resulting 
material using one primer that recognizes die probe enzyme adapter and one primer that 
recognizes one specific sampling enzyme adapter. Similar to the S- fragments, the 
amplification reaction can be performed making use of a biotinylated primer that 
matches the probe enzyme adaptor such that the S-»- fragment mixture can be 
immobilized. 

Two alternative pairs of PacI- and Bfel-adi5)tots, as well as 
correqwnding non-selective primers are used; e.g. set I is used for the amplification 
of the S- fragments and set n for Oie preparation of S-l- fragments: 
Set I 
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5* -CTCGTAGACTGCGTACCCAT (SEQ ID NO: 17) 
3 ' - CATCTGACGCATGGG (SEQ ID NO: 18) 



5' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 
3 ' -TACTCACXSACTCAT (SEQ ID NO: 4) 

5 • -GACTGCGTACCCATTA (SEQ ID NO: 19) 

5 • -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 



Setn 

5 * -GAGCATCTGACGCATGGGAT (SEQ ID NO: 20) 
3 ' -GTAGACTGCGTACCC (SEQ ID NO: 21) 



5' -CTGCTACTCAGGACTG (SEQ ID NO: 13) 

3' -AT6AGTCCTGACAT (SEQ ID NO: 14) 

5»-CTGACGCATGGGATTA (SEQ ID NO: 22) 

5» -CTACTCA6GACTGTAG (SEQ ID NO: 16) 



The adaptor ligated to the ends generated by the NCGN-cleaving probing enzyme and 
the corr^ponding amplification primer have the following structures: 

5* -GTCCTCATCGAGCATG (SEQ ID NO: 23) 
3 ' -AGTAGCTCGTACGC (SEQ ID NO: 24) 

5» -CCTCATCGAGCATGCG (SEQ ID NO: 25) 



(ii) Hybridization-selection step(s) The S- fragment mix is 
hybridized to the biotinylated S+ fragments. Following hybridization, the biotinylated 
products are captured onto streptavidin-coated magnetic beads. The beads are 
rq)eatedly washed to remove all unhybridized fragments and thereafter the hybridized 
S- firagments are eluted. These are then reamplified with the Pad and Bfal primers and 
the hybridization-selection procedure is rq)eated at least once. Finally the amplified 
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ftagments arc cloned in an appropriate vector and a series of around 2,000 inserts are 
sequenced. To select a set of S+ fragments, this procedure is repeated in reverse using 
this time biotinylated S- fragment. Upon comparison of the S+ and S- sequences ESP 
fragments are readily identified as fragments having partially overlapping sequences 
and in which the S- fragment sequence shows a mutated NCGN restriction site at the 
internal boundary of the overlap. In this way, ^500 ESPs are readily characterized. 

Step 2. Genetic analvsis of humans usinp ESP prnhe frag^ftntfr 
The sequence-verified ESP firagments are spotted on micro-arrays for 
genetic analysis of human sample DNA. For the preparation of this sample DNA, a pair 
of adaptors/primo^ is used that diflFers from those attached to the arrayed S- or S+ set of 
ESP fragments. From each individual, an undigested control sample and a probe enzyme 
digested test sample are prepared. These samples are labeled with Cy3 and Cy5, mixed 
and hybridized to the micro-arrays as described before. Alternatively, the hybridization 
mixture may be composed of difiFerentially labeled test DNA and previously genotyped 
control DNA, both digested with the probing endonuclease. In both cases, the Cy3 
(test/digested sample) and Cy5 (control/undigested DNA) signal intensities are normalized 
using a number of monomorphic control probes. The ratio of these normalized Cy3/Cy5 
signals for each of the ESP probes^ allows accurate determination of the allelic state of the 
sample at each polymorphic site (homozygous S+/S+, homozygous S-/S-, hetero^gous 
S+/S). 

The micro-array hybridization experiment may in the first place be 
performed with the sample DNAs, deriving from a collection of mdividuals, from which 
the ESP probe firagments were isolated. Such an experiment will, in the first place, confirm 
the polymorphic nature of the selected probe fragments and allow their testing in a 
multitude of measurements. The data will also yield information on the allele frequencies 
among an appreciable number of chromosomes. 
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Human genftio analysis usinp fnr^^ at-n WAA 
As described for com in Example 2, the format-I ESP assay for human 
genetic analysis may be converted to a fonnat-H or a format-in assay. Based on the 
sequence of the selected and experimentally validated ESP fiagments, it is indeed 
possible to design a pair of dedicated, i.e. ESP-specific, PGR primers. Such primers 
can be combined in a number of pandlel multiplex reactions, which are in turn 
combined to obtain the sample DNA [Wang, D. G.etal. Science 280: 1077-1082 
(1998)]. This sample DNA is hybridized against a micro-array of spotted S+ ESP 
fragments {see to Example 3). The experiment is set up such tiiat the fluoiescenfly 
labded ESP-specific primer and the S H- sequences are located on opposite sides of tiie 
polymoiphic site. Alternatively, the unlabeled ESP-specific amplification primers may 
be arrayed as hybridization probes. The development of a format-H or foimat-m assay 
need not be preceded by the identification of ESP fragments (using one of tiie metiiods 
described in tiie previous examples). In tiie present example, we describe tiie 
developmait of an RAA assay based on tiie sequence of previously discovered SNPs. 

Close inspection of flie known SNPs reveals tiiat a significart percentage 
of tiiem are associated wifli botii tiie loss and gain of a restriction recognition site, i.e. 
each of two alleilic sequences is associated wifli a different restriction lecpgnition site. 
The single nucleotide substitution may inter-convert recognition sequaices fliat are 
identical except for one nucleotide [e.g. Plel (GACTC) and Hgal (GACGC), Hgal and 
SfaNI (GATGC), SfaNI and Bbvl (GCTGC)]. Alternatively, tiie allelic recognition 
sites may be partially overlapping [e.g. MaeH (ACGTg) and Nlam (aCATG); in tiie 
latter case tfie inter-conveision dq)ends on tiie nature of tiie upstteam or downstream 
sequences). Such mutually exchisive restiiction site allelism offers a distinct advantage. 
The RAA technique wiU normaUy only detect tiie allele tfiat is devoid of a recognition 
site for tiie probing enzyme; tiieiefore, determination of tiie zygosity requires careful 
cahlMation of tiie signal against tiiat observed witii undigested cortrol DNA. When each 
allele is associated wifli flie presence/absence of a restriction site, two parallel RAA- 
assays can be performed, each involving digestion witii one of tiie alternative enzymes. 
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With such an assay, both alleles can be positively identified and the zygosity is readily 
detennined. The two parallel assays are best performed in a two-color mode; one of 
the primers is differentially labeled (e.g. with Cy3 and Cy5 as described previously) 
such that the amplification reactions can be mixed and hybridized against a single array 
S of probes. 

We have systematically e3q)lored the SNP database of the Whitdiead 
Institute for mutational changes that promote reliction site inter-conversions and have 
calculated their occurrence frequency. Two SNP-associated recognition site inter- 
conversions were found to occur at high frequency: Maell *> Nlam and Hgal -> 

10 SfaNI. In both cases the mutational changes converting one site into another are C^T 
(or G-^A) transitions occurring in CpG dinucleotides. This finding is entirely consistent 
with the fact that this type of mutation occurs with a 6-8 times higher fi-equency than other 
nucleotide substitutions. Based on the number of SNPs found in the Whitehead database, 
we estimate the total number of SNPs in the human genome for the enzyme pairs 

15 Maell/Nlalll and Hgal/SfaNI at respectively 30,000 and 15,000. These numbers are 
presumably somewhat overestimated since both Maell and Hgal are susceptible to CpG 
methylation. Consequently the inter-conversion can only be measured at the non- 
methylated sites. Therefore, in practice, RAA assays designed on the basis of sequence 
data should be validated on a number of test samples. Assays in which no cleavage takes 

20 place at the CpG-containing site in none of the mdividuals tested, should be eliminated 
from the RAA bi-allelic marker systems. 

The foregoing examples are illustrative of the invention and are not intended to be limit 
the scope of the invention as set out in the claims. All of the references cited herein are 
incorporated by reference. 

25 
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WE CLAIM: 

1 . A method for detecting an endonuclease site polymoiphism (ESP) in 
DNA, the method comprising: 

(a) isolating sample DNA; 

(b) derivin^ja set of concomitantly anq>]ifiable taig^ DNA fragments 
from the sample DNA; 

(c) treating the target DNA fragments obtained in step (b) with a 
probe restriction endonuclease reagent; 

(d) amplifying the probe restriction endonuclease reagent treated 
target DNA fragments of step(c); 

(e) analyzing the DNA of st&p (d) to determine which target 
fragments are amplified and/or which taiget fragments are not amplified; and wherein 
target DNA fragments which are amplified lack a recognition site for the probe 
restriction endonuclease reagent and target fragments having a recognition site for the 
probe restriction endonuclease reagmt are not amplified. 

2. The method of claim 1 the concomitantly amplifiable target DNA 
fragment of step (b) are derived by treatmrat of the sample DNA with a sampling 
restriction endonuclease reagent. 

3. The method of claim 2 wherein the concomitantly amplifiable DNA 
fragments of step (b) are derived from sample DNA by treatment of the sample DNA 
with a first and a second restriction endonuclease reagent. 

4. The method of claim 3 wherein said first restriction endonuclease 
reagent has a recognition sequence of six or more nucleotides and the second restriction 
endonuclease reagents has a recognition sequence of four or fewer nucleotides. 

5. The method of claim 3 or 4 wherein said concomitantly amplifiable 
tsagei DNA fragments are derived by step wise trsatment of said sample DNA with the 
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6. The method of claim 1 further comprising preparing of PGR primers 
which flank the endonuclease site polymorphism (ESP) for use in amplifying said 

S concomitantly amplifiable target DNA fragments. 

7. The method of claims 1, 2, 3, and 4 wherein the concomitantly 
an^lifiable DNA fragments are modified by ligation of adapters to both termini of said 
fragments, and wherein said adaptors are capable of serving as primers for 

10 amplification. 

8. The method of claim 5 wherein the concomitantly amplifiable DNA 
fragments are modified by ligation of adapters to both termini of said fragments, and 
wherein said adaptors are capable of serving as primers for amplification. 

15 

9. The method of claim 1 wherein the probe restriction endonuclease 
reagent of step (c) has a recognition sequence comprising six or more nucleotides. 

10. The method of claim 1 wherein the probe restriction radonuclease 
20 reagent of step (c) has a recognition sequence comprising four or more nucleotides. 

11. The method according to claim 1 wherein the probe restriction 
endonuclease of step (c) has a recognition sequence of two nucleotides. 

25 12. The method according to claim 1 wherein the order of the steps (b) and 

(c) are reversed or carried out simultaneously. 

13. The method according to claim 1 wherein said endonuclease site 
polymorphism is an alteration in a concomitantly amplifiable target fragment giving 
30 rise to a nucleotide sequence that is recognized and cut by the probe restriction 
endonuclease reagrat . 
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14. The method of claim 1 wherein said site polymorphism is an alteration 
in the nucleotide sequence of a concomitantly amplifiable taiget fragment which 
eliminates a recognition sequence for said probe restriction mdonuclease reagent. 

5 IS. The method of claims 1, 2, 3 and 4 wherein said concomitantly 

amplifiable DNA fragments are amplified by a polymerase chain reaction. 

16. The method of claim S wherein said concomitantiy amplifiable DNA 
fragments are amplified by a polymerase chain reaction. 

10 

17. The method of claim 1 wherein amplified target ftagments are 
identified by their ability to hybridize to cognate probe DNA fragments. 

18. A method for obtaining probe DNA fragments for use in detecting 
IS endonuclease site polymorphisms, the method comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantiy amplifiable taiget DNA fragments 
from the sample DNA; 

(c) selecting from flie taiget DNA fragments , probe DNA fragments 
20 having an endonuclease site polymorphism (ESPs) for the probe restriction 

endonuclease. 

19. The mediod of claim 17 wherein said probe DNA fragments are derived 
by digestion of sample DNA with one or more sampling restriction endonuclease 

2S reagents. 

20. The method of claim 18 wherein probe DNA fragmmts are derived by 
digestion of a pool of sample DNAs obtained from one or more individuals of a 
species. 

30 

21 . The m^od of claim 18 wherein tiie probe DNA fcagmrats arc derived 
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by digestion of a pool of sample DNAs obtained from 10 or more individuals of a 
species. 

22. The method of claim 18 wherein the probe DNA fragments derived by 
digestion of a pool of sample DNAs obtained from a pool of 50 or more individuals of 
species. 

23. Hie method of any one of claims 19-21 wherein said species is selected 
from the group consisting of procaryotic species and eucaryotic species. 

24. A method for obtaining probe DNA fragments for use in detecting 
endonuclease site polymorphisms (ESP) comprising preparing synthetic 
oligonucleotides based on the nucleotide sequence of ampliiiable target DNA fragments 
containing endonuclease site polymoiphism(s). 

25. A method for producing a microarray of probe DNA the method 
comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) selecting probe DNA fragments having restriction endonuclease site 
polymorphisms (ESPs) from the sample restriction endonuclease treated target DNA 
fragments of step (b); and 

(d) arraying the probe DNA fragments obtained in step (c) on a solid 
substrate in a predefined region by attaching the fragments to the substrate. 

26. The method of claim 24 wherein the DNA fragments of step (b) are 
obtained by treating sample DNA with one or more sample restriction endonuclease 
leagents. 



27. Tlie method of claim 24 wherein the said probe DNA fragments of step 
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(d) are synthetic oligonucleotides which correspond to the concomitantly amplifiable 
target DNA fragments derivable from said sample DNA and containing an 
endonuclease site polymoiphism (ESP). 

28. The method of claim 25, 26 or 27 wherein the solid support is selected 
from a group consisting of a planar solid support, a bead, a sphere and a polyhedron. 

29. The mefliod of claim 25 wherein the microaiiay comprises at least 2,000 
probe fragments. 

30. The m^od of claim 26 wherein the microarray comprises at least 2,000 
sythetic ologonucleotides. 

31 . The method of daim 27 wherein the microanay comprises at least 2,000 
probe fragments. 

32. The method of claim 28 wheran the microanay comprises at least 2,000 
probe fragments. 

33. The method of claim 25 wherein the microarray comprises at least 
20,000 probe fragments. 

34. The method of claim 26 wherein the microanay comprises at least 
20,000 sythetic ologonucleotides. 

35. The method of claim 27 wheroin the microarray comprises at least 
20,000 probe fragments. 



36. The method of claim 28 wheroin the microanay comprises at least 
20,000 probe fragments. 
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SEQUENCE LISTING 



<110> METHEXIS N.V. 

<120> RESTRICTED AMPLICON ANALYSIS 

<130> 29314/34158A 

<140> 
<141> 

<150> 60/107,293 
<151> 1998-11-09 

<160> 28 

<170> Patentin Ver, 2.0 

<210> 1 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 1 

ctcgtagact gcgtacc 

<210> 2 
<211> 18 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5» direction. 

<400> 2 

aattggtacg cagtctac 

<210> 3 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Secjuence: primer 



<400> 3 

gacgatgagt cctgag 
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<210> 4 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5' direction. 

<400> 4 

tactcaggac teat 

<210> 5 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<22l> mis cofeature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<22l> mis cofeature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<400> 5 

gactgcgtac caattcnn 

<210> 6 
<211> 19 

<2i2> vm. 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<221> misc_feature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<220> 

<221> misc feature 
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<222> (19) 

<223> At position 19 N = A, C, G, or T 
<400> 6 

gatgagtcct gagtagnnn 

<210> 7 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 7 

ctcgtagact gcgtacatgc a 

<210> 8 
<211> 14 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 • to 3 • 
direction. As presented in the specification the 
sequence reads in the 3» to 5» direction. 

<400> 8 

tgtacgcagt ctac 

<210> 9 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 9 

gactgcgtac atgcag 

<210> 10 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 10 

gatgagtcct gagtag 



<210> 11 
<211> 21 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 11 

gagcatctga cgcatgttgc a 

<210> 12 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 * to 3 • 
direction. As presented in the specification the 
sequence reads in the 3* to 5' direction. 

<400> 12 
acatgcgtca gatg 

<210> 13 
<211> 16 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 13 

ctgctactca ggactg 

<210> 14 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide secjuence reads in the 5» to 3* 
direction. As presented in the specification the 
sequence reads in the 3 ' to 5 ' direction. 

<400> 14 
tacagtcctg agta 

<210> 15 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 15 

ctgacgcatg ttgcag Ig 

<210> 16 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<40C> 16 

ctactcagga ctgtag ig 

<210> 17 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 17 

ctcgtagact gcgtacccat . 20 

<210> 18 
<211> 15 
<212> DNA 

<213> Artificial Secfuence 
<220> 

<223> Description of Artificial Sequence: primer 

<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5» to 3 » 
direction. As presented in the specification the 
sequence reads in the 3* to 5 ' direction. 



<400> 18 

gggtacgcag tctac 15 

<210> 19 
<211> 16 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 19 

gactgcgtac ccatta Ig 



<210> 20 
<211> 20 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 20 

gagcatctga cgcatgggat 

<210> 21 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sec[uence Listing the 
nucleotide sequence reads in the 5 * to 3 * 
direction. As presented in the specification the 
sequence reads in the 3' to 5» direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 21 

cccatgcgtc agatg 

<210> 22 
<211> 16 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 22 

ctgacgcatg ggatta 

<210> 23 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 23 

gtcctcatcg agcatg 

<210> 24 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5' direction. 
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<220> 

<223> Description of Artificial Sequence: primer 

<400> 24 
cgcatgctcg atga 

<210> 25 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 25 

cctcatcgag catgcg 

<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 26 

gagcatctga cgcatcc 

<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Secjuence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 27 

aattggatgc gtcagatg 

<210> 28 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 28 

ctgacgcatc caattc 
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Supplementary Material 

Principal C mp nents Analysis 

Although an analysis of variance (ANOVA) can be used to assess within-platform vs. between-platform 
variability on a gene-by-gene basis, it cannot be performed to analyze multiple genes simultaneously. For this 
purpose, we used a multivariate approach based on principal components analysis (PCA) (1) as visualized by Partek 
Pro (v. 5) software to summarize within-platform vs. between platform variability across a multidimensional scale. 
Like the comparison of the distribution of Z-scores, making comparisons between platforms with PCA requires that 
we bring all measurement technologies to a single common scale. In order to achieve this, we further applied a van 
der Waerden's transformation to the expression values from each chip for a direct comparison of platforms. This 
transformation entails replacing the data with its ranks and then applying an inverse normal transformation to the 
result to give (in the absence of tied values) a perfectly normal distribution. Supplemental Figure 3a shows the first 
two principal components of the data, which account for most of the variability in the multidimensional dataset (34,9 
+ 37.3 = 72.2%). Although the points (arrays) in supplemental figure 3a are color-coded by platform the principal 
components are computed without regard to platform category. It is clear from supplemental figure 3a that the 
between-platform variability is much greater than the within-platform variability (the first PC largely separating the 
cDNA from the two oligonucleotide platforms, and the second PC separating all platforms, especially the two 
oligonucleotide platforms). Supplemental figure 3a reveals that any variability due to the treatment condition is 
clearly overshadowed by the platform effect. Supplemental figure 3b shows a dot plot of PC #3, which clearly does 
capture the treatment effect. Since PC #3 explains 8.5% of the variability of the entire data, there does seem to be an 
ability to distinguish differential expression due to the treatment after masking out platform effects. 

Contingency Table Analysis 

After modeling gene expression with an ANOVA model using a .001 alpha (Supplemental Table 6), we 
found that the Amersham assay detected the largest number of genes (117), the Agilent assay identified 67, and the 
Affymetrix assay found 34 differentially expressed genes. McNemar's test statstics confuted for each of the three 
platform pairs indicated significant differences (p < .017) in the number of genes found to be differentially 
expressed by each of the technologies. This observed difference may reflect the different levels of experimental 
variability associated with each of the platforms as seen in figure 3. For two of the comparisons, the Fisher's exact 
test for association between the gene sets were significant (p < .017) demonstrating that agreement between the 2 
platforms occurred in a manner that was non-random. However, when comparing the Amersham and Affymetrix 
gene lists, no significant (non-random) association could be found. 

A subset of the genes in our lists exhibited statistically significant differential expression at an alpha cut-off of .001 
with less than 2-fold differential expression. Since microarray technologies are frequently considered more accurate 
in detecting genes differentially expressed at levels greater than or equal to 2-fold (2), we applied an additional 2- 
fold change (in both directions) minimum criterion upon any genes found significant at a .001 alpha level to assess 
whether it would increase the level of overlap of differentially expressed genes detected by each platform. Despite 
applying this additional fold change criterion, we could not reject the null hypothesis (p < .017) of no association 
between the Amersham and Affymetrix gene lists (Supplemental Table 7). Only when the alpha cutoff was reduced 



to .01 with a 2-fold minimum criterion could we reject the null hypothesis of no association across all 2- 
comparisons (Supplemental Table 8). 



Supplemental Table 1: Pearson's product-moment and Spearman's rank-order correlation coefficients of gene 
expression measurements from 3 commercial microarray technologies matched by their Unigene ID. P-values of the 
hypothesis of no correlation are also reported. 



Comparison 


Platform A 


Platform B 


Pearson's 


P-value 


Spearman's 


P-value 


N 


1 


Amersham 


Agilent 


0.54505 


<.0001 


0.54630 


<.0001 


8024 


2 


Amersham 


Affymetrix 


0.59118 


<.0001 


0.58727 


<.0001 


8024 


3 


Agilent 


Affymetrix 


0.56198 


<.0001 


0.55232 


<.0001 


8024 



Supplemental Table 2: Pearson's product-moment and Spearman's rank-order correlation coefficients of the fold 
change of time 0 hours and time 24 hours measurements matched by their Unigene ID. P-values of the hypothesis of 
no correlation are also reported. 



Comparison 


Platform A 


Platform B 


Pearson's 


P-value 


Spearman's 


P-value 


N 


1 


Amersham 


Agilent 


0.63374 


<.0001 


0.57991 


<.0001 


4012 


2 


Amersham 


Affymetrix 


0.56066 


<.0001 


0.53731 


<.0001 


4012 


3 


Agilent 


Affymetrix 


0.59903 


<.0001 


0.59549 


<.0001 


4012 



Supplemental Table 3: Contingency tables of differential gene expression classifications from data matched by 
Unigene ID that was modeled with a mixed-model 



nested ANOVA and an alpha cut-off of .001. In each 
cell the frequency, row percentage, and column 
percentage is reported. Probability values from 
Fisher's Exact Test of association and McNemar's 
test of agreement are also reported. 

Table of AgUent by Affymetrix j 
; Affymetrix ' , 
Agilent No Yes Total i 



No 



Yes 



3800 

98.52 

138" 
89.03 
3.50 



Total 



3938 



57" 
1.48 I 
J77.03 ' 

17* "i" 
10.97 I 

22.97 j 

74 1 4012 



3857 



155 



Tabl e of Amersham by Affymetrix 



Amersham 


' . No 


Yes 


Total 


No, 


3693 


57* 


3750 


: 


98.48 


1.52 






93.78 


77.03 


J. : . — 1 


Yes" 


245"'^ 


17*" 


"262" 




93.51 


6.49 






6.22 


22.97 




Total 


' 3938 


■'"74 


3012_j 



Table of Amersham b y Agilent 

Agi lent 

No ! Yes I Total 



Ame rs ham 
No 



Yes 



3650 
97.33 
94.63 



100" 
2.67 
64,52 



3750 



1 



207" 
79.01 

5.37 



55* , 
20.99 I 
35.48 ! 



262 



Total 



3857 155 



4012 



Platform A 



A gilent 



Platform B 



Affymetrix 



I Amersham 1 Affymetrix 



Exact P-value 
for McNemar's 
Test 

<.o6or 



I Fisher's Exact 
I ! Test P-value 



<.0001 



<.0001 



<.0001 



Amersham 



Agilent 



<.0001 



<.0001 



* significant non-random association 
" significantly different detection rates 



Supplemental Table 4: Pearson's product-moment correlation coefficients of technical and biological replicate 
measurements. P-values of the hypothesis of no correlation are also reported. 



Comparison Platform 



Affymetrix 
Amersham 
Agilent 



Technical 
Replicates 

0.91894 
0.99259 
0.98727 



P-value 

<.0001 
<.0001 
<.0001 



Biological 
Replicates 

0.91255 
0.98240 
0.96435 



P-value N 

<.0001 4018 

<.0001 4018 

<.0001 4018 



Supplemental Table 5: Average difference of fold change measured by two platforms 
Comparison Platform A Platform B Mean Difference N StdDev 

1 Amersham Agilent 0.0514661 2009 0.2502959 

2 Agilent Affymetrix 0.1033182 2009 0.3141808 

3 Amersham Affymetrix 0.1547843 2009 0.3347937 



Supplemental Table 6: Contingency tables of differential gene expression classifications using a mixed-model 
nested ANOVA and an alpha cut-off of .001 . In each cell the frequency, row percentage, and colxunn percentage is 
reported, in that respective order. Probability values from Fisher's Exact Test of association and McNemar's test of 
agreement are also reported. 





Affymetrix j 




Agilent 


No . Yes 


Total 


No 

• 


1917 ' 25"^ ; 
98.71 ; 1.29 i 
97.06 ^ 73.53 1 


1942 


Yes 


58^ ■ 9* ; 
86.57 13.43 
2.94 1 26.47 ; 


67 


Total 


1975 : 34 ! 


2009 



Table of Amersham by Affymetrix 





Affymetrix 




Amersham 


No 


Yes 


Total 


" No 


1863 " 


"29^ 


1892"^ 




98.47 


1.53 






94.33 


85.29 




Yes 


112^" 


5 


iiy 




95.73 


4.27 






5.67 


14.71 




Total 


1975 ' 


34 " 


2009 



Table of Amersham by Agilent 






Amersham 


Agi 


lent 1 


i 


No 


Yes ! Total 


No 


1848 
97.67 
95.16 


44" ! 1892 
2.33 1 
, 65.67 i' 


Platform A 

; 1 

1 

hL_ _ 1 


j Exact P-value 
\ Platform B |: for McNemar*s 
i' Test 


Fisher's Exact 
Test P-value 
(Right) 


Yes 


94"" 
80.34 
; 4.84 


'23* j: 117^ 
19.66 : 
34.33 i 


Agilent 


' Aff^etrix 


~ 1)B04^ 


<.0001 


Amerhsam 


Affymetrix 


1 <,0001 


0.0441 


• 1 

Total 


T942 


67 1 2009 


I Amersham 


, Agilent 


! <.oooi 


i <.oooi 



* significant non-random association 
" significantly different detection rates 



Supplemental Table 7: Contingency tables of differential gene expression classifications using a mixed-model 
nested ANOVA and an alpha cut-off of .001 and a 2- 
fold minimum criterion. In each cell the frequency, 
row percentage, and column percentage is reported, in 
that respective order. Probability values from Fisher's 
Exact Test of association and McNemar's test of 
agreement are also reported. 



Tabl e of Agilent by Affym etrix 



Affymetrix 



No ! 

: ! 


1955 
99.09 
98.39 


18 

0.91 
81.82 


1 1973 


Yesi 

1 


32' 
88.89 i 
1.61 ' 


4* 

11.11 
18.18 


1' 36' ■ ■ 

! 

J — — 1 


Total { 


1987 ; 


22 


1 2009 



Table of Amersham by Affymetrix 


Amersham 


Affymetrix 


r— ]; 

Total! 


No 


Yes 


No 


1933 20^ r 1953 
98.98 1 1.02 , 1 
97.28 ! 90.91 i ' 


Yes 


54^ 
96,43 

2.72 


2 56 i 
3.57 ; 

9.09 i 1 


1 Total 


"1987" 


^22;' ! 2009 1 



Table of Amersham by Ag[ileiit 


Agilent 
: Amersham , No j Yes 


Total 


No ' 1933 
' 98.98 
i 97.97 


20+ 
1.02 
55.56 


i953^ 


Yes^^ 40^ 
1 71.43 


16* 
28,57 


56 



2.03 t 44.44 



Total 



1973 



36 



2009 







Exact P-value 


Fisher's Exact ! 


Platform A 


Platform B 


. for McNemar's 


Test P-value ] 






Test 


(Right) 


Agilent 


Affymetrix 


0.0649 


0.0005 


, Amersham ~ 


, Affymetrix 


<.0001 


0.1236 ; 


Amersham 


Agilent 


0.0135 


i <.0001 



* significant non-random association 
^ significantly different detection rates 



Supplemental Table 8: Contingency tables of differential gene expression classifications using a mixed-model 
nested ANOVA and an alpha cut-off of .01 and a 2-fold minimum criterion. In each cell the frequency, row 
percentage, and column percentage is reported, in that respective order. Probability values from Fisher's Exact Test 
of association and McNemar's test of agreement are also reported. 



Affymetrix i 




Agilent i No ' 


Yes I 


Total 


No. 1878 
• 96.65 , 
: 97.56 ' 


65 i' 
3.35 ! 
77.38 1 


1943 


Yes! 47 ■ 

71.21 ; 

1 2.44 , 


19* ■ 
28.79 ; 
22.62 i 


66 


Total T 1925 i 


84 i. 


2009 i 



Table of Amersham by Affymetrix 


; Amersham 


Affymetrix 


Total 


No 


Yes 


No 


i'794' 
96.66 
93.19 


V 62^ 
3.34 
73.81 


1856 


Yes 


13 r ^ 22* i 153 
85.62 : 14.38 ! 
. 6.81 ; 26.19 1 


^Totai 


1925 1 ^84 


2009 



Table of Amersham by Agilent 


Amersham 


Agilent 


No Yes! Total 


" "~No 


1824 
98.28 
93.88 


^ 32"" j 1856 ' 
1.72 ! i 
48.48 ! 


Yes 


77.78 
6.12 


34* 1" r53~^ 
22.22 i 
51.52 I 


Total ! 1943 


' 66 7 2009 ^^ 







Exact P-value 


Fisher's Exact 


Platform A 


i Platform B 


i for McNemar's 


TestP-value 


• 




! Test 


(Right) 


Agilent 


Affymetrix 


r ' 0.T078 


1 <.0001 


Amersham 


. Affymetrix 


, ^<.0001 


I <.0001 


Amersham 


Agilent 


1 <.0001 


1 <,0001 



* significant non-random association 
^ significantly different detection rates 



Supplemental Table 9: Contingency tables of differential gene expression classifications using a mixed-model 
nested ANOVA and an alpha cut-off of .001. The Affymetrix data was normalized using dChip. In each cell the 
frequency, row percentage, and column percentage is reported. Probability values from Fisher's Exact Test of 
association and McNemar's test of agreement are also reported. 



Table of Agilent by Affymetrix 


Table of Amersham by Affymetrix : 


• 


Affymetrix 


' Affymetrix 


Total ! 


Agilent 


No Yes Total 


Amersham ' No \ Yes 


i No 


1922 "i 20""" 1942 } 
98.97 i 1.03 ; 1 

97.12 I 66.67 ; 1 

1, 


No ■ 1875 17"" ^ 
' 99.10 1 0.90 
ii 94.74 56.67 


1892 I 

i 


Yes 


57"" : 10* ! 67 " 
85.07 14.93 ; 
2.88 i 33.33 i 


Yes j l04^ 1 13*" 
88.89 1 11.11 
j 5.26 43.33 


^ Ti 7^ I 

i 
t 

i 


Total 


1979 1 30 '■ 2009 i 


Total 1 1979 | 30 


^2009 1 







Exact P-value 


Fisher's Exact^ 


Platform A 


Platform B 


for McNemar's 


i Test P-value 1^ 






Test 


(Right) 


Agilent 


Affymetrix 


; <'6ooi 


<.0001 ' 


Amersham 


Affymetrix 


<.0001 


<.0001 



* significant non-random association 
**■ significantly different detection rates 



Supplemental Table 10: Contingency tables of differential gene expression classifications using a mixed-model 
nested ANOVA and an alpha cut-off of .001. The Affymetrix data was normalized using RMA. In each cell the 
frequency, row percentage, and column percentage is reported. Probability values from Fisher's Exact Test of 
association and McNemar's test of agreement are also reported. 



: Table of^gilent by Affymetrix 

I Affymetrix 

I Agilen t i No Yes To tal; 



No 



I 1914 
98.56 

I 97.11 



Yes 



57" 
85.07 
2.89 



Total 



1971 



28" 
1.44 
73.68 



10* 
14,93 
26.32 



38 



1942 



67 



2009 



Table of A mersham by Affymetrix 



Amersham ' 
No 



Affym e trix ; 



No 

1866 
98.63 
94,67 



Yes Total 



26" 
1.37 
68,42 



Yes 



Total 



105" 12* 

89.74 10.26 

5.33 I 3 1.58 

1971 



1892 



117 



38 2009 







Exact P-value 


Fisher*s Exact ; 


Platform A | 


Platform B 


for McNemar*s 1 


Test P-value i 






Test ! 


(Right) \ 


Agilent \ 


Affymetrix 


" 0.0022 " ' 


" Voobi i 


i Amersham 1 


" Affymetrix 


<.oooi i 


<.oooi i 



* significant non-random association 
" significantly different detection rates 



Supplemental Table 11: Classifications based upon AfRymetrix data normalized with dChip, RMA, and MASS. 
Contingency tables of differential gene expression classifications using a mixed-model nested ANOVA and an alpha 
cut-off of .001. In each cell the frequency, row percentage, and column percentage is reported. Probability values 
from Fisher's Exact Test of association and McNemar's test of agreement are also reported. 



Table of RMA by MA SS 
i MASS 1 



1 


No 


Yes 




No 

1 
1 


1950 ! 
98.93 ! 
98,73 ; 


21 ; 

1,07 I 
61.76 i 


1971 


Yes! 


25 ' 
65.79 ' 
1.27 i 


13* j 
34.21 1 
38,24 1 


^38'T 


1 Total : 


1975 


^ 34 


2009 i 



Table of dChip by RMA 




RMA 




dChip 


No 


Yes 


Total 


No 


1952 
98.64 
99.04 


27 
1.36 
71.05 


1979 j 


Yes 


19 1 
63.33 
0,96 


if* 

36.67 
28.95 


30 


Total 


1971 


38 


2009 



Table of dChip by MASS 




dChip 


MASS 


Total 


No 


Yes 


No' 


1950 
98.53 
98,73 


f 29 " 

i 1.47 

1 85.29 


' 1979 


Yes 


25 ^ 
83.33 
1.27 


^^5* 
16.67 
14.71 


: 30 "1 

1 


Total 


1975 34 ["2009" 











Platform A 


Platform B 


Exact P-value 
for McNemar's . 
Test 


Fisher's Exact 
Test P-value 
(Right) 


RMA 


MAS5 


0.6587 


<.o6oi ] 


" dChip 


" ^1^S5 


0.6835 


0.0001 


dChip 


RMiT " 


0.3020 _j 


<.0001 



* significant non-random association 



Supplemental Figure Legends 

Supplemental Figure 1: Scatter plots of log intensity values of the first and second experimental replicates. 
Supplemental Figure 2: Scatter plots of log intensity values of the first and second biological replicates. 
Supplemental Figure 3: Principal Components Analysis (PCA) of the data indicated that variation of signal values 
across microarray technologies was greater than signal variation caused by experimental treatment. 



